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Abstract 

We investigate the behaviour of population models written in Stochastic Concurrent Constraint 
Programming (sCCP), a stochastic extension of Concurrent Constraint Programming. In particular, 
we focus on models from which we can define a semantics of sCCP both in terms of Continuous 
Time Markov Chains (CTMC) and in terms of Stochastic Hybrid Systems, in which some populations 
are approximated continuously, while others are kept discrete. We will prove the correctness of the 
hybrid semantics from the point of view of the limiting behaviour of a sequence of models for increasing 
population size. More specifically, we prove that, under suitable regularity conditions, the sequence of 
CTMC constructed from sCCP programs for increasing population size converges to the hybrid system 
constructed by means of the hybrid semantics. We investigate in particular what happens for sCCP 
models in which some transitions are guarded by boolean predicates or in the presence of instantaneous 
transitions. 

Keywords: Stochastic process algebras; stochastic concurrent constraint programming; stochastic hy- 
brid systems; mean field; fluid approximation; weak convergence 

1 Introduction 

Stochastic Process Algebras (SPA) are a powerful framework for quantitative modelling and analysis of 
population processes [38]. They have been applied in a wide varieties of contexts, including computer 
systems [38], biological systems [24] [15] [23], ecological [58] and crowd [46] modelling. 

However, their standard semantics, given in terms of Continuous Time Markov Chain (CTMC, [49]), 
suffers from the problem of state space explosion, which impedes the use of SPA to analyse models with 
a large state space. A recent technique introduced to tackle this problem is fluid-flow approximation [39], 
which describes the number of system components by means of continuous variables and interprets rates as 
flows, thus providing a semantics in terms of Ordinary Differential Equations (ODE). 

The relationship between these two semantics is grounded on the law of large numbers for population 
processes [44] , first proved by Kurtz back in the seventies [43] . Applying this theoretical framework to SPA 
models, one obtains that the fluid- flow ODE is the limit of the sequence of CTMC models [60, 21, 37], 
obtained by the standard SPA semantics for increasing system size, usually the total number of agents in 
the system. This also provides a link with a large body of mathematical literature on fluid and mean field 
approximation (see e.g. [14] for a recent review). 

These results provide the asymptotic correctness of the fluid semantics and justify the use of ODEs to 
analyse large collective SPA models. Fluid approximation is also entering into the analysis phase in a more 
refined way than just by numerical simulation. For instance, in [36], the authors use fluid approximation 
for the computation of passage-times, while in [13] the fluid approximation scheme is used to model check 
properties of single agents in a large population against CSL properties. 

Despite the remarkable success of fluid approximation of SPA models, its applicability is restricted to 
situations in which all components are present in large quantities, and all events depend continuously on 
the number of the different agent types. This excludes many interesting situations, essentially all those in 
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which some sub-populations have a fixed and small size. This is the case in biological systems, when one 
considers gene networks, but also in computer systems when one models some form of centralized controller. 
Furthermore, the description of control policies is often simplified by using forced (or instantaneous) events, 
happening as soon as certain condition are met, and more generally guard predicates, modulating the set 
of enabled actions as a function of the global state of the system. 

These features of modelled systems are not easily captured in a fluid flow scheme, as they lead naturally 
to hybrid systems, in which discrete and continuous dynamics coexist. To deal with these situations, in 
[17, 18] the authors proposed a hybrid semantics for a specific SPA, namely stochastic Concurrent Constraint 
Programming (sCCP, [15]), associating with a sCCP model a hybrid system where continuous dynamics 
is interleaved with discrete Markovian jumps. In [19], also instantaneous transitions are incorporated in the 
framework. In this way, one can circumvent the limits of fluid-flow approximation, whilst keeping discrete 
only the portions of the system that cannot be safely described as continuous. Roughly speaking, this 
hybrid semantics works by first identifying a subset of system variables to be approximated continuously, 
keeping discrete the remaining ones. The latter set of variables identifies the discrete skeleton of the hybrid 
system, while the former defines the continuous state space. Then, each activity of agents, corresponding 
to a transition that modifies the state of the system, is classified as continuous, discrete/stochastic, or 
discrete/instantaneous. The first class of transitions is used to construct a vector field giving the continuous 
dynamics of the hybrid system (in each mode), while the other two transition classes define the discrete 
dynamics. 

The advantages of working with a hybrid semantics for SPA are mainly rooted in the speed-up that can 
be achieved in the simulation, as discussed e.g. in [18] and [50]. Moreover, the hybrid semantics put at 
disposal of the modeller a broader set of analysis tools, like transient computation [61] or moment closure 
techniques [56, 48]. 

While the theory of deterministic approximation of CTMC is well developed, hybrid approximation has 
attracted much less attention. To the author's knowledge, the preliminary work [11] on which this paper is 
based was the first attempt to prove hybrid convergence results in a formal method setting. There has been 
some previous work on hybrid limits in [4], restricted however to a specific biological example, and in the 
context of large deviation theory [55] , where deterministic approximation of models with level variables has 
been considered (but in this case transitions between modes are fast, so that the discrete dynamics is always 
at equilibrium in the limit). More recent work is [28], which discusses hybrid limits for genetic networks 
(essentially the class of models considered in [11] with some extensions). 

The focus of this paper is to provide a general framework to infer consistence of hybrid semantics of 
SPA models in the light of asymptotic correctness. In doing this, we aimed for generality, proving hybrid 
limit theorems for a framework including instantaneous events, with guards possibly involving model time, 
random resets, and guards in continuous and stochastic transitions. The goal was to identify a broad set of 
conditions under which convergence holds, potentially usable in static analysis algorithmic procedures that 
check if a given model satisfies the conditions for convergence. We will comment on this issue in several 
points in the paper. To author's knowledge, this is the first attempt to discuss hybrid approximation in 
such generality. 

We will start our presentation recalling sCCP (Section 2.1) and the hybrid semantics (Section 2.3). We 
will formally define it in terms of Piecewise Deterministic Markov Processes (Section 2.4, [31]), a class of 
Stochastic Hybrid Processes in which the continuous dynamics is given in terms of Ordinary Differential 
Equations, while the discrete dynamics is given by forced transitions (firing as soon as their guard becomes 
true) and by Markovian jumps, firing with state dependent rate. The hybrid semantics is defined by 
introducing an intermediate layer in terms of an automata based description, by the so-called Transition- 
Driven Stochastic Hybrid Automata (TDSHA, Sections 2.2 and 2.5, [17, 18]). 

After presenting the classic fluid approximation result, recast in our framework (Section 4), we turn 
our attention to sCCP models that are converted to TDSHA containing only discrete/stochastic and 
continuous transitions, with no guards and no instantaneous transitions, but allowing random resets (general 
for discrete/stochastic transitions and restricted for continuous ones). In Section 5, we prove a limit theorem 
under mild consistency conditions on rates and resets, showing that the sequence of CTMC associated with 
a sCCP program, for increasing system size, converges to the limit PDMP in the sense of weak convergence. 
Technically speaking, the appearance of weak convergence instead of convergence in probability, in which 
classic fluid limit theorems are usually stated, depends on the fact that the limit process is stochastic and 



2 



can have discontinuous trajectories. 

We then turn our attention to the limit behaviour in the presence of sources of discontinuity, namely 
instantaneous transitions (Section 6) or guards in continuous (Sections 7.2 and 7.1) or discrete/stochastic 
transitions (Section 7.4). 

In all these cases, the situation is more delicate and the conditions for convergence are more complex. 
Guards in continuous transitions introduce discontinuities in the limit vector fields, requiring us to define the 
continuous dynamics in terms of the so-called piecewise-smooth dynamical systems [26] or, more generally, 
in terms of differential inclusions [3]. Here, however, we can exploit recent work in this direction [20, 35], and 
the hybrid convergence theorem extends easily, provided we can guarantee global existence and uniqueness 
of the solutions of the discontinuous differential equations. 

The situation with guards for discrete/stochastic transitions and with instantaneous events is even more 
delicate: subtle interactions between the continuous dynamics and the surfaces in which guards can change 
truth status (called discontinuity or activation surfaces in the paper) can break convergence. We discuss 
this in detail first for instantaneous transitions (Section 6) and then for guards in discrete/stochastic tran- 
sitions (Section 7.4). In these sections, we identify regularity conditions to control these subtle interactions, 
extending the convergence also to this setting. However, checking these conditions is more complicated, 
because they essentially impose restrictions on the global interactions between the vector fields and the 
discontinuity surfaces. A way out of this problem, hinted in the conclusions (Section 8) is to increase the 
randomness in the system by adding noise on resets and initial conditions or on the continuous trajecto- 
ries, i.e. considering hybrid limits with continuous dynamics given by Stochastic Differential Equations or 
Gaussian Processes [42] . In the conclusions we will also comment on the applicability of our results to the 
stationary behaviour of the CTMC. Throughout the paper, starred remarks contain material that can be 
skipped at a first reading. 

2 Preliminaries 

In this section, we introduce preliminary concepts needed in the following. We will start in Section 2.1 
by presenting sCCP, the modelling language that will be used in the paper. We will then introduce 
Transition-Driven Stochastic Hybrid Automata (TDSHA, Section 2.2), an high level formalism to model 
the limit hybrid processes of interest, namely Piecewise Deterministic Markov Processes (PDMP, Section 
2.4). Finally, we will consider also how to define a hybrid semantics for sCCP by syntactically transforming 
a sCCP model into a TDSHA (Section 2.3) and a TDSHA into a PDMP (Section 2.5). 

2.1 Stochastic Concurrent Constraint Programming 

We briefly present stochastic Concurrent Constraint Programming (sCCP, a stochastic extension of CCP [54]) 
In the following we just sketch the basic notions and the concepts needed in the rest of the paper. More 
details on the language can be found in [10, 15]. 

sCCP programs are defined by a set of agents interacting asynchronously and exchanging information 
through a shared memory called the constraint store. The constraint store consists of a set of variables plus a 
set of constraints, which are first order predicates restricting the admissible domain of variables. By adding 
constraints to the store, agents refine the available information. In this paper, we consider a restricted notion 
of constraint store, containing only stream variables, i.e. variables "a la Von Neumann" which have a single 
value at any given time, and can be updated during the computation 1 . We further restrict the language by 
forbidding local variables. This restricted version of sCCPhas proved to be sufficiently expressive, compact, 
and especially easy to manipulate for our purposes, in particular for what concerns the definition of the 
fluid [21] and the hybrid semantics [17, 18]. In this paper, however, we enlarge the primitives at our disposal 
with respect to [21, 18], as done in [19], by including also instantaneous transitions, random resets, and 
environment variables (which can take values in an uncountable set). 

Definition 2.1. A sCCP program is a tuple A — (A, Def , X, 2?, x ), where 

1 Formally, one can view these variables as list, so that new values are appended at the end of the list, see [15] for further 
details. 
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1. The initial network of agents A and the set of definitions Def are given by the following grammar: 



Def = | Def U Def | {C = M} 

A = | C | A || A M = tt.A \ M + M 

it = fo(X) «(X,X',W)] A(X) I fo(X) -+ u(X,X', WJJoo^x) 



2. X is the set of stream variables of the store (with global scope). A variable X e X takes values in 
T>x- Variables are divided into two classes: model or system variables whose domain T>x has to be 
a countable subset of K (usually the integers), and environment variables, whose domain can be the 
whole R. The state space of the model is therefore V = Jlxex ^x', 

3. x is the initial value of store variables. 

System variables usually describe the number of individuals of a given population, like the number of 
molecules in a biochemical mixture or the number of jobs waiting in a queue. Environment variables, on the 
other hand, arc useful to describe properties of the "environment", like the temperature of a biochemical 
system, or the value of a controllable parameter that may change at run-time. Examples of the use of 
environment variables will be given in Section 5. 

In the previous definition, a basic action tt (called throughout the paper also event or transition) is a 
guarded update of (some of the) store variables. In particular: 

• the guard ff(X) is a quantifier-free first order formula whose atoms are inequality predicates on variables 



• the update u(X, X') is a predicate on X,X', a conjunction of atomic updates of the form f\X' = 
r(X, W) (where X' denotes variable X after the update), where each variable X' appears only once. 
Here r is a function with values in T>x, and can depend on the store variables X and on a random 
vector W in R h (for some h > 0), which can also depend on the current state of variables X. Updates 
will be referred to also as resets. 

• The rate function A : V — > M>o is the (state dependent) rate of the exponential distribution associated 
with 7r, which specifies the stochastic duration of tt; 

• if, instead of A, an action 7r is labelled by oo : p(X), it is an instantaneous action. In this case, 
p : V — »■ R> is the weight function associated with the action. 

Updates can be seen as (random) functions from T> to itself, and they can be very general. However, in the 
following we will need to restrict them in order to define the fluid semantics. An atomic reset is a constant 
increment update if it is of the form X' = X + k, with k e K such that X' e T>x whenever X e T>x (usually 
X, k £ Z) and it is a random increment update if it is of the form X' = X + /i, with fi a random number, 
such that \fj,\ has finite expectation. An update is a constant /random increment if all its atomic updates 
are so. 

Example 2.1. We introduce now a simple example that will be used for illustrative purposes throughout 
the paper. We will consider a simple client-server system, consisting of a population of clients which request 
a service and, after having obtained an answer, process it for some time before asking for another service, 
in a loop. The servers, instead, reply to client's request at a fixed rate. We ignore any internal behaviour of 
servers, for simplicity. However, servers can break down and need some time to be repaired. We can model 
such system in sCCP by using 4 variables, two counting the number of clients requesting a service (X r ) 
and processing data (X t ), and two modelling the number of idle servers ready to reply to a request (Xj) 
and the number of broken servers (Xb). The initial network is then client || server, with initial conditions 
X r = Xb = 0, X t = Ni, and = N 2 . The client and server agents are defined as follows (* stands for 
true): 



X: 



client = [* -> X' r 

def r -»w 

server = [* — > X ■ 



+ [*^x\ 



X r — 1 A X[ 
X r + 1 A X' t 
X t - 1 A X' b 
X t + 1 A X' b 



X t - l]fe t x t -client 
X b + l]k b Xi .server 
X b - l]k f x b - server 



X t + l]min{k r x r ,fc„x i } • client + 
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Note in the previous code how the rate at which information is processed by clients corresponds to the 
global rate of observing an agent finishing its processing activity. Observe also that we defined the service 
rate as the minimum between the total request rate of clients and the total service rate of servers. This use 
of minimum is consistent with the bounded capacity interpretation of queueing theory and of the stochastic 
process algebra PEPA [38]. This global interaction-based modelling style is typical of sCCP, see [15] for 
a discussion in the context of systems biology. Furthermore, although wc want to keep all variables > 0, 
we are not using any guard in the transitions. However, non-negativity is automatically ensured by rates, 
which, by being equal to zero, disallow transitions that would make one variable negative. 

In order to simplify the definition of the fluid and hybrid semantics, we will work with a restricted 
subclass of sCCP programs, that we will call flat. A flat sCCP program satisfies the following two 
additional restrictions: (a) each component C is of the form C = 7r 1 .C + . . . + n h .C, i.e. it always calls itself 
recursively, and (b) the initial network A is the parallel composition of all components, i.e. A =||ceDef C. 
Note that the client-server model of Example 2.1 is flat. 

The requirement of being flat is not a real restriction, as each sCCP program respecting Definition 
2.1 can be turned into an equivalent flat one, by adding fresh variables counting how many copies of each 
component C arc in parallel in the system. Guards, resets, rates and priorities have to be modified to 
update consistently these new variables, (see Appendix B for an example) 

In the following definitions, we will always suppose to be working with flat sCCP models, possibly 
obtained by applying the flattening recipe. Given a (flat) sCCP model A = (A, Def, X,X>,x ), we will 
denote by action s (C) the set of stochastic actions of a component C and by actiorij(C) the set of its 
instantaneous actions. We will use the following notation: 

• For an action n £ action s (C) U action^C), we denote by guard[-7r](X) or g T (X) its guard. 

• For an action n £ action s (C) U action^C), we denote by reset[7r](X, W) or iv(X,W) its update 
function (so that X' = r w (X,W)). 

• For an action n £ action ,,(C) U actiorii(C), if ir has a constant increment update, we will denote the 
increment vector by (so that X' = X + k ff ), while if n has a random increment update, we will 
denote it by fi v . We also let be either k T or //„.. 

• For an action n £ action s (C), we denote by rate[-7r] or A w its rate function. 

• For an action n £ action^C), we denote by weight^] or its weight. 

A sCCP program with all transitions stochastic can be given a standard semantics in terms of Continu- 
ous Time Markov Chains, in the classical Structural Operational Semantics style, along the lines of [15]. For 
a flat sCCP model, the derivation of the labelled transition system (LTS) is particularly simple. First, the 
state space of CTMC corresponds to the domain V of the sCCP variables. Secondly, each stochastic action 
7r £ action^C) of a component C defines a set of edges in the LTS. In particular, if in a point x it holds that 
g7r(x) is true and P{r 7r (x ; W) = y} = p y > 0, then we have a transition from x to y with rate p y A T (x). As 
customary, the rates of the edges of the LTS connecting the same pair of nodes are summed up to get the 
corresponding rate in the CTMC. Instantaneous transitions, on the other hand, can be dealt with in the 
standard way as in [45] , by partitioning states of V into vanishing (in which there is an active instantaneous 
transition) and non- vanishing (in which there is no active instantaneous transition) , and removing vanishing 
states in the LTS, solving probabilistically any non-deterministic choice between instantaneous transitions 
with probability proportional to their weight. 

We will indicate by X(i) the state at time t of the CTMC associated with a sCCP program A with 
variables X. 

If all transitions of a sCCP program are stochastic and have constant increment updates, they can be 
interpreted as flows, and a fluid semantics can be defined [21]. However, to properly deal with random 
resets and instantaneous transitions, it is more convenient to consider a more general semantics for sCCP, 
in terms of stochastic hybrid automata [17, 18, 19]. This approach will also allow us to partition variables 
and transitions into discrete and continuous, so that only a portion of the state space will be approximated 
as fluid. 
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2.2 Transition-driven Stochastic Hybrid Automata 

Transition-Driven Stochastic Hybrid Automata (TDSHA, [17, 18]) has proved to be a convenient intermedi- 
ate formalism to associate a Piecewise Deterministic Markov Process with a sCCP program. The emphasis 
of TDSHA is on transitions which, as always in hybrid automata, can be either discrete (corresponding to 
jumps) or continuous (representing flows acting on system variables). Discrete transitions can be of two 
kinds: either stochastic, happening at random jump times (exponentially distributed), or instantaneous, 
happening as soon as their guard becomes true. 

In this paper, we consider a slight variant of TDSHA, in which discrete modes of the automaton are 
described implicitly by a set of discrete variables (variables taking values in a discrete set), rather than 
explicitly. This syntactic variant is similar to the one used in [12], and is introduced in order to simplify 
the mapping from flat sCCP models. 

Definition 2.2. A Transition-Driven Stochastic Hybrid Automaton (TDSHA) is a tuple 
T = {Z,Q,Y,T€,1D,ie,init), where: 

• Z = {Zi, . . . ,Zk} is the set of discrete variables, taking values in the countable set Q C R k . Each 
value q e Q, q = (z\, ...,Zk) is a control mode of the automaton. 

• Y = {Y 1 , . . . , Y n } is a set of real valued system variables, taking values in R™. We let X = Z U Y be 
the vector of TDSHA variables, of size m = k + n? 

• T£ is the multi-set 3 of continuous transitions or flows, containing tuples n = (k, /), where k is a 
real vector of size m (identically zero on components corresponding to Z), and / : Q x R™ — > R is a 
piecewise continuous function for each fixed q € Q (usually, but not necessarily, Lipschitz continuous 4 ). 
We will denote them by v v , and fj,, respectively. 

• TS is the multi-set of discrete or instantaneous transitions, whose elements are tuples rj = (G,R,p), 
where: p : Q x R" — >• M> is a weight function used to resolve non-determinism between two or more 
active transitions, G is the guard, a quantifier-free first-order formula with free variables among X, 
and R is the reset, a conjunction of atoms of the form X' = r(X, W), where r : Q x R" x M. h — > R, is 
the reset function of X, depending on variables X as well as on a random vector W in R . Note that 
the guard can depend on discrete variables, and the reset can change the value of discrete variables Z. 
In the following, we will interpret the reset as a vector of k + n functions, R : Q x R™ x R h — > Q x R n , 
equal to X' = r(X,W) in the component corresponding to X if X' = r(X,W), and equal to the 
identity function for all those variables X unchanged by the reset. The elements of a tuple n are 
indicated by g,,, r v , and p,,, respectively. 

• T© is the multi-set of stochastic transitions, whose elements are tuples n — (G, R, A), where G and R 
are as for transitions in ID, while A : Q x R" — > R + is a function giving the state-dependent rate of 
the transition. Such a function is indicated by A^. 

• init = (z ,yo) £QxM™ is the initial state of the system. 

A TDSHA has three types of transitions. Continuous transitions represent flows and, for each n € 1€, 
v„ and i v give the magnitude and the form of the flow of n on each variable Y S Y, respectively (see also 
Section 2.5). Instantaneous transitions n e ID, instead, are executed as soon as their guard g v becomes 
true. When they fire, they can reset both discrete and continuous variables, according to the reset policy r v , 
which can be either deterministic or random. Weight is used to resolve probabilistically the simultaneous 
activation of two or more instantaneous transitions, by choosing one of them with probability proportional 
to p^. Finally, stochastic transitions v e T6 happen at a specific rate A^, given that their guard g n is true 
and they can change system variables according to reset r^. Rates define a random race in continuous time, 
giving the delay for the next spontaneous jump. 

The dynamics of TDSHA will be defined in terms of PDMP, see Section 2.5 or [17, 18] for a more detailed 
discussion. 

2 Notation: the time derivative of Yj is denoted by Yj, while the value of Yj after a change of mode is indicated by Y-. 
3 Multi-scts arc needed to take into account the proper multiplicity of transitions. 

4 A function / : R m — > R is Lipschitz continuous if and only if there is a constant L > 0, such that ||/(xi) — /(x2)|| < 
L||x 1 -x 2 || 
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Composition of TDSHA. We consider now an operation to combine two TDSHA with the same vectors 
of discrete and continuous variables, by taking the union of their transition multi-sets. Given two TDSHA 
Ti = (Z,Q,Y,T£i,T2)i,T©i,imi) and T 2 = (Z,Q,Y, 

j < %&2, init), agreeing on discrete and continuous variables and on the initial state, their com- 
position T = Ti W T 2 is simply T = (Z,Q, Y,T£i U 1€ 2 ,^i U T£> 2 ,T6i U 1& 2 ,init), where the union 
symbol U refers to union of multi-sets. 

2.3 From sCCP to TDSHA 

In this section we recall the definition of the semantics for sCCP in terms of TDSHA [18]. We will assume 
to work with flat sCCP models, so that we can ignore the structure of agents and focus our attention 
on system variables. In this respect, this approach differs from the one of [18], but it provides a more 
homogeneous treatment. 

The mapping proceeds in three steps. First we will partition variables into discrete and continuous. 
Then, we will convert each component into a TDSHA, and finally we will combine these TDSHA by the 
composition construction defined in the previous section. 

The first step is to consider a flat sCCP model A = (A, Def , X, V, x ), and partition its set of variables 
X. Recall that variables X are divided into model variables X m and environment variables X e . Model 
variables X m are partitioned into two subsets: X<j, to be kept discrete, and X c , to be approximated 
continuously. Hence X = X^ U X c U X e . How to perform this choice depends on the specificity of a given 
model: some guidelines will be discussed in Remarks 2.1 and 5.1. We stress here the double nature of 
environment variables: they will be treated like discrete variables in terms of the way they can be updated, 
but as continuous variables for what concerns their domain, i.e. they are part of the continuous state space 
of the TDSHA. 

Once variables have been partitioned, we will process each component C € Def separately, subdividing 
its stochastic actions action s (C) into two subsets: disc(C), those to be maintained discrete, and cont(C), 
those to be treated continuously. This choice confers an additional degree of freedom to the mapping, but 
has to satisfy the following constraint: 

Assumption 1. Continuous transitions must have a constant increment update or a random increment 
update, i.e. = X + i^- Furthermore, their reset cannot modify any discrete or environment variable, i.e. 
u n [X] = 0, for each X g X d U X e . 

We will now sketch the main ideas behind the definition TDSHA associated with a component C. 

Continuous transition. With each ir e cont(C), we associate 77 e T£ with rate function fr,(X) = 
I{g7r(X)} • A 7r (X), where I{-} is the indicator function of the predicate g ff (X), equal to 1 if g7r(X) is 
true, and to zero if it is false. The update vector is k^, if tt has a constant increment update. If ir has 
random increment fi„, we define the update vector as E[/x w ], the expected value of the random vector 

Stochastic transitions. Stochastic transitions arc defined in a very simple way: guards, resets, and rates 
are copied from the sCCP action n g disc(C). 

Instantaneous transitions. Instantaneous transitions are generated from sCCP instantaneous actions 
ir g actiorij(C), by copying guards, resets and priorities. 

We can define formally the TDSHA of a sCCP component as follows: 

Definition 2.3. Let A = (A, Def, X, T>, x ) be a flat sCCP program and (Xrf,X c ,X e ) be a partition 
of the variables X. Let C be a component, with stochastic actions action s (C) partitioned into disc(C) U 
cont(C), in agreement with Assumption 1. The TDSHA associated with C is T(C, disc(C), cont(C)) = 
(Z, Q, Y, 1<t, ID, T6, init), where 

5 Alternatively, we could have considered the support {/^, ...,//*,...} of fj,-n, with probability density P(/z^), . . . , P(^), . . ., 
and generated a family of continuous transitions with rate P(//*)f 7) (X) and update vector However, if we add up these 
transitions as required to construct the vector field (see Section 2.5), we obtain E[^i 7r ]f J) (X), i.e. the two approaches are 
equivalent. 
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Figure 1: TDSHA associated with client and server components of Example 2.1, together with their 
composition. Continuous transitions are rendered into a set of ODEs, as explained in Section 2.5. 

• Z is equal to X^, while Y = X c U X e . Q is the domain of X^ in A, and init = Xo- 

• With each ir E cont(C) with constant increment reset rv = X + k^, we associate rj = (k^,^) G T£, 
where f„(X) = lfe(X)} • A„(X). 

• With each n e cont(C) with random increment reset r„. = X + fx„, we associate rj = (E[// T ], f^) € TC, 
where f,, is defined as above. 

• With each n e disc(C) we associate r\ = (g ff (X), r ff (X), A ff (X)) G 16. 

• With each n e action; ((7) we associate 77 = (g 7r (X),r 7r (X),p 7r (X)) e T£>. 

Finally, the TDSHA of the whole sCCP program is obtained by taking the composition of the TDSHA 
of each component, as follows: 

Definition 2.4. Let A = (A, Def,X,2?,Xo) be a flat sCCP program and (Xd,X c ,X e ) be a partition of 
variables X. The TDSHA T(A) associated with A is 

T{A)= l+J T{C, disc(C),cont(C)). 

CGDef 

Example. Consider the sCCP program of Example 2.1. The TDSHA associated with its two components 
(client and server) and their composition are shown in Figure 1. In this case, we partitioned variables 
by making all client variables continuous, i.e. X r and X t , and all server variables discrete, i.e. X{ and X\,. 
This describes a situation in which there are few servers that have to satisfy the requests of many clients. 
Consequently, we considered all client transitions as continuous and all server transitions as discrete. 

Remark 2.1. Choosing how to partition variables into discrete and continuous is a complicated matter, 
and depends on specific features of the model under study. We postpone a more detailed discussion on 
this issue to Remark 5.1 in Section 5, as this choice can depend on the notion of system size, which has 
still to be introduced. Here we just note that a non-flat sCCP model may naturally suggest a candidate 
subset of variables to be kept discrete, namely state variables of a sequential sCCP component (i.e. an 
agent changing state but never forking or killing itself) present in a single copy in the network. This is 
the approach followed e.g. in [17, 19] to define the control modes of the TDSHA. However, the approach 
presented here is more general: different partitions of model variables and stochastic transitions lead to 
different TDSHA, which can be arranged in a lattice, as done in [18]. 



8 



2.4 Piecewise Deterministic Markov Processes 



The dynamical evolution of Transition Driven Stochastic Hybrid Automata is defined by mapping them to a 
class of stochastic processes known as Piecewise Deterministic Markov Processes (PDMP, [31]). They have 
a continuous dynamics given by the solution of a set of ODE and a discrete and stochastic dynamics given 
by a Markov jump process. The following definition deviates slightly from the classical one for PDMP in 
the way the discrete state space is described. 

Definition 2.5. A PDMP is a tuple (Z,Q, Y, E, 4>, X, R), such that: 

• Z is a set of discrete variables, taking values in the countable set Q cR k , the set of modes or discrete 
states. (Hence q e Q is of the form (z\, . . . , Zk)-) Y is a vector of variables of dimension |Y| = n. 
For each q G Q, let E q C M™ be an open set, the continuous domain of mode q. E, the hybrid state 
space, is defined as the disjoint union of E q sets, namely E = \J ql£ Q{q} x E q . A point x e E is a pair 
x = (q,y), y € E. 6 In the following, we will denote Z U Y by X, so that variables X range over E. 

• With each mode q e Q we associate a vector field F q : E q — >• R n . The ODE y = F q (y) is assumed 
to have a unique solution starting from each yo G E q , globally existing in E q (i.e., defined until the 
time at which the trajectory leaves E q ). The (semi)flow <p q : E q x M+ — >■ W l of such vector field is 
assumed to be continuous in both arguments. <p q {t,yo) denotes the point reached at time t starting 
from y E E q . 7 

• A : E — > R + is the jump rate and it gives the hazard of executing a discrete transition. It is assumed 
to be (locally) integrable. 

• R : E U dE x B — > [0, 1] is the transition measure or resei kernel. It maps each y e E Li dE on 
a probability measure on (E,B), where ,8 is the Borel cr-algebra of E. R(x,A) is required to be 
measurable in the first argument and a probability measure for each x. 

The idea underlying the dynamics of PDMP is that, within each mode q, the process evolves along the 
flow <j) q . While in a mode, the process can jump spontaneously with hazard given by the rate function \. 
Moreover, a jump is immediately performed whenever the boundary of the state space of the current mode 
is hit. 

In order to formally capture the evolution, we need to define the sequence of jump times and target 

states of the PDMP, given by random variables 7i,xi,72,X2, Let i*(x) = inf{£ > | <f) q (t,x) e dE q } 

(with inf0 = oo) be the hitting time of the boundary dE q starting from x = (q, y) e E. We can define 
the survivor function of the first jump time T\, given that the process started at x = {q,y), by F(t,x) = 

P(Ti >t) = / t<t . (x) cxp (- / o 'A(g,0,(«,x))ds). 

This defines the probability distribution of the first jump time 7\, which can be simulated, as customary, by 
solving for t the equation F(t, x) = [/-J, with U\ uniform random variable in [0, 1]. Once the time of the first 
jump has been drawn, we can sample the target point \i of the reset map from the distribution i?(x^, •), 
with xj^ = (j) q (Ti,x), using another independent uniform random variable U\. From \\ = (qi,x\), the 
process follows the flow <p qi (t — T\, xi), until the next jump, determined by the same mechanism presented 
above. 

Using two independent sequences of uniform random variables U]q and we are effectively construct a 
realization of the PDMP in the Hilbcrt cube [0, 1]". A further requirement is that, letting N t — l{t > T^} 
be the random variable counting the number of jumps up to time t, it holds that N t is finite with probability 
1, i.e. Vt,EA t < oo, see [31] for further details. If this holds, then the PDMP is called non-Zeno. 

Remark 2.2. In [11], we proved some limit results restricting the attention to the case in which no instanta- 
neous jump can occur. This amounts to requiring that each E q has no boundaries, i.e. E q = R", or, more 
precisely, that i*(x) = oo for each xeE If, in addition to this description, we also require the vector field 
to be Lipschitz continuous and the stochastic jumps to be described by a finite set of transitions r\ with rate 
A^ and reset given by a constant increment v n , we obtain the so called simple PDMP [11]. 

6 Sec Appendix C for a brief discussion on metric and topological properties of hybrid state spaces. 

7 Usually, F q is locally Lipschitz continuous, hence the solution exists and is unique, provided trajectories do not explode in 
finite time. However, as in the paper we will consider also situations in which the vector field con be discontinuous due to the 
presence of guards, we have chosen this more general condition. 
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2.5 From TDSHA to PDMP 



The mapping of TDSHA into PDMP is quite straightforward, with the exception of the definition of the 
reset kernel. Essentially, the problem lies in the fact that each discrete transition of a PDMP has to jump 
in the interior of the state space E, which will be defined as the set of points in which no guard of any 
instantaneous transition is active. However, in a TDSHA we do not have any control over this fact, and 
we may define guards of transitions r\ € ID in such a way that an infinite sequence of them can fire in the 
same time instant. For instance, the transitions (X >= 1,X' = 0,1) and (X <= Q,X' = 1,1) will loop 
forever if one of them is triggered. In order to avoid such bad behaviours, we will forbid by definition the 
possibility that two discrete transitions fire in the same time instant. We will call chain-free a TDSHA with 
this property. This condition is unnecessarily restrictive, and can be relaxed allowing the firing of a finite 
number of finite sequences (loop-free TDSHA), as done in [18], but it allows a simpler definition of the reset 
kernel of the PDMP. The interested reader is referred to [18] for the construction of the reset kernel for 
loop-free TDSHA. The good news here is that all the results in this paper extend immediately to loop-free 
TDSHA. The bad news is that checking if a TDSHA is loop-free is in general undecidable, as one can easily 
encode an Unlimited Register Machine in a TDSHA [18]. However, some sufficient conditions in terms 
of acyclicity of a graph constructed from transitions in ID have been discussed in [34]. Practically, most 
models will satisfy the chain-free condition, as the discrete controller described by instantaneous transitions 
is usually simple. More advanced controllers will perform some form of local computation, which can then 
result in a loop-free model. Violation of the loop-free property, instead, usually indicates an error in the 
model. 

We now briefly introduce some notation, and then define chain-free TDSHA and the PDMP associated 
with a chain-free TDSHA. 

Let T = (Z,Q, Y,1€,1D,ie,init) be a TDSHA. Given a transition r\ e ID, we let G v = {x e 
Q x R™ | g,,(x) true}, and R v = {x e Q x R" | x € r r) (G r) , W)}. R v is the set of points that can be reached 
after the firing of n, defined as the image under r,, of the closure G v of the activation set G v of the guard. 
Similarly, for r\ G 1&, we let R v = {x e Q x R" | x e r^({xi | A^(xi) > 0}, W)}, the set of points that can 
be reached by a stochastic jump. 

Definition 2.6. A TDSHA is chain-free if and only if, for each 771 e ID U T6 and each i] 2 G T6, 
R m n G m = 0. 

Consider now a chain-free TDSHA T = (Z, Q, Y, %£, ID, 16, init). Then, its associated PDMP V = 
(Z,Q,Y,E,<(>,X,R) is defined by: 

• Discrete and continuous variables, and discrete modes Q, are the same both in T and in V. 

• The state space of the PDMP, encoding the invariant region of continuous variables in each discrete 
mode, is defined as the set of points in which no instantaneous transition is active: 

n k 

Note that E q is open, because we are intersecting the complement of the closure of each set G v . 

• The vector field is constructed from continuous transitions, by adding their effects on system variables: 

F(x) = 2 !/„ • f,(x). (1) 

• The rate function A is defined by adding point-wise the rates of all active stochastic transitions: 

A(x) = ]T I{g t) (x)}A r ,(x). (2) 

r)E%6 



10 



• The reset kernel R for x G E is obtained by choosing the reset of one active stochastic transition in 
x with a probability proportional to its rate. As all such resets jump to points in the interior of E by 
the chain-free property of the TDSHA, we have 

R^A) = £ ( I ig^^ FK(x,W) € A}) , (3) 

1)616 ^ ^ ' ' 

where A £ B, the Borel cr-algebra of E. If the reset of r\ is deterministic, then P{r n (x, W) e A] = 
5 r>) (x,w)> where 5 Xl (A) is the Dirac measure on the point xi G E, assigning probability 1 to xi and 
to the rest of the space. 

• The reset kernel R on the boundary dE is defined from resets of instantaneous transitions. If more 
than one transition is active in a point x <G dE, we choose one of them with probability proportional 
to their weight. Let p(x) = Y. v e%v | g„(x) t™ e Pr/( x )> thcn 

R(x,A)= J2 (^!§ V{r ^ W) e A} ) ■ (4) 

iielD I g„(x) true ^ P ^ ' ' 

• The initial point is x = init. 

From now on, we implicitly assume that all the TDSHA obtained by the sCCP models we consider 
are chain-free. In general this may not be true and has to be checked. However, the property will hold 
straightforwardly in all the examples of this paper, and it will also be true in many practical examples. 
Indeed, as it is enough to consider loop- free TDSHA [18], this check may be automatically performed by 
the method of [34]. 

3 System Size and Normalisation 

In this paper we are concerned with the correctness of the hybrid semantics of sCCP in terms of approx- 
imation or limit results. Essentially, we want to show that "taking the system to the limit" , the standard 
CTMC semantics of sCCP converges (in a stochastic sense) to the PDMP defined by the hybrid semantics. 

Clearly, this idea of convergence requires us to have a sequence of models. This sequence will depend 
on the size 7 of the system. Hence, we will be concerned with the behaviour of a sCCP program, when the 
system size goes to infinity. 

The concrete notion of system size depends on the model under examination and the type of system 
being modelled. In general, it is related to the size of the population, intended as the number of agents or 
entities in the system (which in flat sCCP models are counted by the system variables). For instance, in the 
client/server example (Example 2.1), this can be the total number of clients or the total number of clients 
and servers. In an epidemic model, this can be the size of the total population, or of the initial population, 
if we allow also birth and death events. However, the notion of system size can also be connected to the size 
of the population in an area or a volume. In this case, when the size increases, both the number of agents 
and the area or volume increase, usually keeping constant the density (number to area or volume ratio). 
The classical examples here are biochemical systems, in which we consider molecules in a given volume. 
Furthermore, in a model of bacteria's growth (like that of Example B.l), we may be interested in increasing 
the number of bacteria together with the area of the Petri dish in which the culture is grown. 

In order to make the notion of size explicit, we will decorate a sCCP model with the corresponding 
population size. 

Definition 3.1. A population-sCCP program (-4,7) consists of a sCCP program A together with the 
population size 7 e R+. 

It is intended that rates of transitions, and even updates, of a population-sCCP program can depend 
on the population size 7. We further stress that, in a population-sCCP program, model variables usually 
take integer values. 
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Example 3.1. We go back to the client-server model of Example 2.1, and consider the population-sCCP 
model in which the size 7 corresponds to the total population of clients and servers, namely 7 = N\ + N2 = 
X r (0)+X t (0)+Xi(0)+Xfc(0). In this scenario, we are interested in what happens when the total population 
increases, maintaining constant the client-to-server ratio. 

A different notion of size can be envisaged, corresponding to a different scaling law. More specifically, we 
can consider 7 = Ni = X r (0) + X t (0), the total number of clients in the system. Increasing this notion of 
size, we are effectively increasing the number of clients requesting information to a fixed number of servers. 
Intuitively, these two different scalings for the client-server system should correspond to two different limit 
behaviours (taking 7 to infinity). 

In order to compare models for increasing values of the size 7, we need to normalize models to the same 
scale. This is done by the normalization operation. Essentially, we will divide system variables by the 
system size (in fact, only those that will be approximated continuously), and express guards, rates, and 
resets in terms of such normalized variables. 

We formalize now the operation of normalization. Consider a population-sCCP program (.4,7), with 
A = (A, Def , X, T>, xo), let X(t) be the associated CTMC, and assume that variables X are partitioned into 
discrete X^, continuous X c , and environment variables X e . Then the normalized CTMC X(i) is constructed 
as follows: 

• Normalized variables are X = (X d ,X c ,X e ), with X c = 7 _1 X C ; 

• Given a stochastic action tt = (g 7r (X), r ff (X), A ff (X), we define: 

- g,r(X) = g ff (X), the guard predicate with respect to normalized variables; 

- Let X' = r„[X](X,W). If X E X d U X e , then r v [X](±,W) = r ff [X](X, W). Otherwise, if 
X G X c , then f w [X](X,W) = 7~ 1 r 7r LY](X, W) (hence, we replaced X c variables with their 
normalized counterpart in the reset function, but also rescaled the reset of X c variables by 
dividing the reset function for 7); 

- A W (X) = A,(X). 

• Instantaneous transitions are rescaled in the same way (expressing the weight function p in terms of 
normalized variables like the rate of stochastic transitions) ; 

• Normalized initial conditions arc xo = (x^.o, 7 x Ci o, x e ,o). 

Applying this transformation to a sCCP program, we can construct the normalized CTMC X(t) along 
the lines of the construction of Section 2.1. Furthermore, we can construct the TDSHA associated with a 
sCCP program by considering normalized transitions and variables, instead of non-normalized ones. As we 
will always compare normalized processes, we will always assume that this construction has been carried 
out. 

Given a population-sCCP program (A, 7), our goal is to understand what will be the limit behaviour of 
a sequence of normalized CTMC X.( N \t), constructed from (.4,7) and a sequence of system sizes 7jv — > 00 
as N — > 00. In order to properly do this, we need to get a better grasp on some related questions, namely: 

1. how to split variables into discrete and continuous; 

2. how rates and updates scale with the system size. 

These two questions are somehow dependent; the last one, in particular, is crucial, as the correct form of the 
limit depends on the scaling of rates and updates. Investigating these issues, moreover, will give us some 
hints on how to choose discrete and continuous variables and transitions to define the hybrid semantics of 
sCCP. 

We will start by considering the fluid case, in which all variables are approximated as continuous. Rates 
and updates will be required to scale in a consistent way, and we will refer to these conditions as the 
continuous scaling. Then, we will turn our attention to hybrid scaling and hybrid limits. 
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Before doing this, we stress that the normalization operation extends naturally to the TDSHA associated 
with a population-sCCP program and, consequently, to the PDMP associated with the so-obtained TDSHA. 
In particular, if we have a sequence (.4, jn) of population-sCCP models, we can construct its normalization 
for each N, and associate a TDSHA with each element of the sequence. We call T(A, 7at) such a TDSHA. 
However, in the rest of the paper we are interested in the limit behaviour, i.e. in models independent of 7^. 
The scaling conditions for each transition that we will introduce will naturally lead to the construction of 
a limit TDSHA, independent of any notion of size, referred to as T(A) in the rest of the paper. 

4 Continuous Scaling and Fluid Limit 

We discuss now the standard fluid limit [43] [44] [42] [29] [30] in our context. We will consider a sequence 
of population-sCCP programs (A,jn) with divergent population size — > 00 as N — > 00. 
In the rest of this section, we will require the following assumptions: 

• All variables X are continuous and thus normalized according to the recipe of the previous section 
(hence there are no discrete or environment variables). 

• There is no instantaneous transition in A. 

• All stochastic transitions are unguarded and have constant or random increment updates. 

In order to define the continuous scaling, we consider the domain E C M m of normalized variables (note 
that here E is not a hybrid state space), which depends on possible values that non- normalized variables 
can take in R m (usually in Z m , see also Remark 4.2 below). In particular, we assume that E contains the 
domain of the normalized variables of a population-sCCP program (A, 7at) for any N > 0, so that also the 
limit process will be defined in E. 

Now we state the continuous scaling assumptions: 

Scaling 1 (Continuous Scaling). A normalized sCCP transition tt = (true, X' = ± + i>i N) ,xi N) (±)) of a 
population-sCCP program (A, 7iv), with E C R" 1 the domain of normalised variables X, has continuous 
scaling if and only if: 

1. there is a function gi N ^ : E — » M>o such that A^^X) = jNgi^CK). Furthermore, gi N ^ converges 
uniformly to a locally Lipschitz continuous and locally bounded function g n : E — > R> (rates are 

0(7*)); 

2. There is a constant or random vector z/ w £ R m such that the non-normalized increments v^ N ^ converge 
weakly to u w , vi N ^ v^? Furthermore, v„ N ^ and v v have bounded and convergent first order 
moments, i.e. E[||^i iV) ||] < 00, E[\\v n \\] < 00, E[vi N) ] E[v n ], and E[|| i/f'||] ->• E[||^||]. In particular, 
it follows that normalized increments are 6(7^ ). 

The intuition behind the previous conditions is that, as the system size increases, rates increase, leading 
to an increase of the density of events on the temporal axis. Furthermore, the increments become smaller 
and smaller, suggesting that the behaviour of the CTMC will become deterministic, with instantaneous 
variation equal to its mean increment. This will produce a limit behaviour described by the solution of a 
differential equation. 

Remark 4.1. Scaling 1 can be generalized in some way, see for instance [30, 29]. However, the version 
stated here is sufficiently general to deal with sCCP programs. If we further restrict the previous scaling 
condition, requiring that gi^(X) = g w (K), where ^(X) is a locally Lipschitz function independent of 7^, 
and vi N ^ — v w , then we obtain the so-called density dependent scaling. For instance, all transitions in the 
client/server model of Example 2.1 are density dependent, as easily checked. 

8 The concept of weak convergence is introduced in Appendix C. 



13 



Remark* 4.2. The structure of the domain E of normalized variables depends mainly on conservation 
properties of the system modelled. For instance, a closed population model (i.e. without birth and death 
events) will preserve the total population (this is the case for the client /server model of Example 2.1), hence 
the domain of the normalized variables will be contained in the unit simplex in R m , which is a compact 
set. For open systems, for instance a model of growth of a population of bacteria (see Example B.l), in 
which the population can (in principle) become unbounded, the domain can be the whole R m . However, 
it is unlikely that populations actually diverge (one may question the reliability of the model itself, if this 
happens), hence one can usually find a compact set that contains the interesting part of the state space (at 
least up to a finite time horizon). In particular, some of the hypotheses that we will state afterwards, like 
locally Lipschitzness or local boundedness, rely on this implicit assumption (i.e., that we can restrict our 
attention to a compact set in any finite time horizon). We will further discuss these issues while proving 
main theorems, once they emerge. 

In order to state the fluid limit theorem, we need to construct the limit ODE. This is done according to 
the recipe of equation 1. More specifically, for each N we construct the drift or mean increment in x as 

FW(x) = £E[*W]XW(*) (5) 

where the sum ranges over all stochastic actions of the sCCP program A. If all sCCP transitions satisfy 
the continuous scaling assumption, F^ N ' converges uniformly to 

F(x) = ^EH 9l (x). (6) 

The limit ODE is therefore — F(x(t)), whose solution starting from x is denoted by x(i). Note that 
this limit ODE can be obtained in terms of TDSHA with continuous transitions only, by the construction 
of Section 2.5. In particular, the limit TDSHA corresponding to the fluid ODE has a continuous transition 
of the form (E [£„.], Jn9tt) = (Tat 1 ^!^]' 1n9-k) for each normalized sCCP transition n. 

Theorem 4.1 (Kurtz [43, 29, 30]). Let (A,Jn) be a sequence of population- sC CP models for increasing 
system size jn — > oo, satisfying the conditions of this section, and with all sCCP-actions it satisfying the 
continuous scaling condition. Let ~X.( N \t) be the sequence of normalized CTMC associated with the sCCP- 
program and x(t) be the solution of the fluid ODE. 

If Xq^ — »■ x almost surely, then for any T < oo, sup t<T flX^ (t) — x(i)|| — > as N — » oo, almost 
surely. □ 

Example. Consider again the client/server model of Example 2.1, in which both the number of clients and 
of servers is increased. Therefore, consider a sequence of models with size equal to the total number of 
clients and servers. It is easy to see that its normalized models all live in the unit simplex E in R 4 , and that 
all its transitions are density dependent, hence satisfy the continuous scaling. Assume that x = (c, 0, s, 0), 
with c + s = 1, so that Xq"' 1 = (cN, 0, sN, 0), meaning that we keep constant the client-to-servcr ratio. The 
fluid ODE associated with this model is 

ktx t — min{/c r 2; r , k s Xi} 
mm{k r x r , k s Xi} — k t x t 
kfXb kbXi 

kbX{ kfXb 

Hence, we can apply Theorem 4.1 to infer convergence of the CTMC sequence to its solution. 

Remark 4.3. The version of Kurtz theorem we presented here is similar to the one of [29], but with scaling 
taken from [30]. The point of the scaling is to prove that noise goes to zero, which is usually shown either 
by some martingale inequality or by using the law of large number of Poisson random variables, using a 
Poisson representation of CTMC. In Appendix D, we present a proof based on the Poisson representation. 

Remark* 4.4. In continuous transitions with random increments, we assumed for simplicity that the dis- 
tribution of the increment is independent from the current state of the system. However, this restriction 
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can be safely dropped, provided that we require uniform boundedness (in any compact K C E) of the 
limit first order moments of the increments, i.e. sup xgJf E[^(x)] < oo and sup xgJf E[||i/(x)||] < co, and 
uniform convergence of the expectation of i/W(x) to f(x), i.e. sup xeK HE^^^x)] — E[^(x)]| — > and 
sup xeK |E[||i/W(x)||] — E[||i/(x)||]| — > 0. Given these conditions, it is easy to check that the resulting se- 
quence of CTMC still satisfy the conditions of [43] (restricted to a suitable compact K), hence Theorem 4.1 
continues to hold. 

5 Hybrid Scaling and Hybrid Fluid Limits 

In this section we will introduce a scaling for transitions that cannot be approximated continuously, roughly 
speaking because their frequency remains constant as the population size grows. We will then prove that the 
sequence of normalized CTMC converges to the PDMP associated with the normalized sCCP model. This 
proof will be first given under a suitable set of restrictions (essentially, restricting to unguarded stochastic 
actions with generic random resets for transitions kept discrete), in order to clarify the main ingredients 
that guarantee convergence. In the next sections, we will remove some of these restrictions, considering 
more complex hybrid limits. 

The first step in the construction of the hybrid limit, which coincides with the first step in constructing 
the sCCP hybrid semantics, is the separation of model variables into discrete and continuous. This step is 
delicate and is model-dependent, as the same model can be interpreted in different ways. For example, the 
client/server model of Example 2.1 can be interpreted continuously, assuming that the number of both clients 
and servers is increased with 7 at, or in a hybrid way, assuming that only the number of clients increases, 
while the number of servers remains constant. In this case, the service rate has also to be increased in 
order to match the larger demand. This can be justified by thinking of an increased number of cores on the 
same machine, in such a way that the breakdown of a server will affect all its cores. We will discuss the 
partitioning of variables in Remark 5.1 below, after introducing the hybrid scaling conditions. 

To this end, we need to modify the conditions of Scaling 1 for continuous transitions. In particular, we 
need to allow the possibility of activating a transition only in a subset of discrete modes. This is enforced 
by guards depending only on discrete (and environment) variables. 

Scaling 2 (Hybrid Continuous Scaling). A normalized sCCP transition tt = (g„.(X), X' = X+^4 , Ai (X)) 
of a population-sCCP program (A,jn), with discrete variables X^, continuous variables X c , and environ- 
ment variables X e , and with XeB, has hybrid continuous scaling if and only if: 

1. the rate A^^X) and the update X' = X + satisfy the same conditions of Scaling 1 

2. The guard predicate g^X) depends only on discrete (X^) and environment (X e ) variables. 

Additionally, we need to define the scaling for discrete stochastic transitions. Also in this case, we will 
assume that their guard depends only on discrete or environment variables. 

Scaling 3 (Discrete Scaling for Stochastic Transitions). A normalized sCCP transition with random reset 
tt = ( g7r (X),X = ri N) (±,W^(±)),\i N) (±)) of a population-sCCP program (A,j N ) with discrete 
variables X^, continuous variables X c , and environment variables X e , with X £ E, has discrete scaling if 
and only if: 

1. the guard predicate g w (X) depends only on discrete (X^) and environment (X e ) variables; 

2. A^pt) = 0(1), A^(X) converges uniformly in each compact K C E to the continuous function 
MX); 

3. Resets converge weakly (uniformly on compact (sub)sets), i.e. for each — > x in E, r,f f \x^ N \'W^ N \^ N ^)) =>■ 
f 7r (x, W(x)), as random elements on E. 

Remark 5.1. The choice on how to partition variables into discrete and continuous is a crucial step. This 
choice is usually model dependent, and relies heavily on the knowledge and intuition of the modeller. 
However, as a general guideline, we can look at two aspects of the model: 
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Conservation Laws: Very often, the identification of discrete variables can be made by looking at con- 
servation laws, i.e. at subsets of variables whose total mass is conserved during the evolution of the 
system, as pursued in [16]. In fact, conserved variables usually are related to internal states of an 
agent which is present in one or very few copies. The identification of these sets can be carried out 
using algorithms like the Fourier-Motzkin elimination procedure [25], or using a constraint based ap- 
proach [57]. In sCCP, when describing non-flat models, these sets of variables, corresponding to state 
variables, are usually evident (cf. Remark 2.1). 

Scaling of Rates: in describing a population-sCCP model, a modeller is forced to make explicit the 
dependence of rates on the system size 7 at. Given this knowledge, it is possible to identify some 
variables that cannot be continuous, otherwise both scaling 1 and 3 would be violated. For instance, 
if we have a rate like kXiX 2 , then at least one of X\ and X 2 has to be discrete, otherwise the 
normalized rate would depend quadratically on 7^. On the contrary, k^ 1 X\X 2 is not compatible 
with both Xi and X 2 discrete, otherwise the rate would vanish. Clearly, not all rate functions are 
informative; for instance, linear rates are compatible both with discrete and continuous scaling. 

The two previous arguments can be used to set up an algorithmic procedure to suggest a possible partition 
of variables into discrete and continuous, given a population-sCCP model. However, we leave this for future 
work. 

We stress that, in general, if the modeller does not know how rates depend on the system size, she may 
choose a partition of variables and a scaling for each transition and impose a dependence of rates on system 
size that is correct with respect to the partition. This dependence has then to be validated a-posteriori, 
checking if it is meaningful in the context of the model. For instance, in a practical modelling scenario for 
the client /server example of Section 2.1, one usually has a fixed number of clients and servers and fixed 
parameters. To apply the convergence results of this paper, a specific scaling has to be assumed, and the 
parameters of the limit model have to be computed consequently. If, for instance, the number of servers is 
kept fixed, we obtain a meaningful limit if the service rate per client is constant. If this cannot be assumed, 
namely if it is the global service rate of servers that remains constant, then the service rate per client 
depends on their number N, and goes to zero as TV increases. Hence, in the limit model the service rate is 
zero. However, for a fixed population size, we can still obtain a hybrid process that approximates closely the 
CTMC, using the size-dependent rates. This phenomenology (uninformative limit, but good size-dependent 
approximation) happens also in the fluid limit setting, see for instance [47]. 

Consider now a population-sCCP model (A, 7jv) with only stochastic actions, in which transitions 
satisfy either the continuous scaling 1 or the discrete scaling 3. The limit TDSHA T(A) constructed from 
this model has continuous transitions of the form (E[z/ ff ], g v ), for each sCCP action it satisfying continuous 
scaling and stochastic transitions of the form {true, r v , for each sCCP action tt satisfying the discrete 
scaling. The limit PDMP is obtained from this TDSHA by the construction of Section 2.5. 

Example 5.1. We consider a new example with a biological flavour, namely a simple genetic network. 
Genes are the storage units of biological information: they encode in a string of DNA the information to 
produce a protein. Each cell has a biochemical machine that is capable of reading the information in a 
gene, first copying it into a mRNA molecule and then translating this molecule into a protein. Genes are in 
fact more than simple storage units: they are also part of the software that controls their own expression. 
In fact, expression is regulated by specific proteins, called transcription factors, which physically bind to 
the DNA close to a gene and activate or repress transcription. There are genes encoding for transcription 
factors that act as self-repressors. We model such a scenario here. 

To construct a population-sCCP model, we need two integer-valued variables: M, counting the amount 
of mRNA, and P, counting the amount of protein. Here the size of the system 7 is the volume times the 
Avogadro number, so that normalized variables represent molar concentrations (see for instance [62, 15]). 
We will consider a model with one agent for the gene (which can be on or off), and agents for translation 
of mRNA into protein and degradation of both protein and mRNA. 
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Figure 2: Left: comparison of stochastic trajectories and fluid ODE for the client-server model of Example 
2.1, with scaling discussed in Section 4. Parameters are k r = 2, k s — 0.8, k t = 1/50, fcb = 1/2000, 
kf = 1/1000, and initial conditions are X^ N) (0) = JV X = lOOiV, X 4 (0) = N 2 = 2N. In the plot, the 
CTMC trajectory for N = 100000 and M = 2000 fully overlap with the solution of the fluid ODE. Right: 
comparison of a trajectory of the limit PDMP and of the CTMC for the gene network model of Example 
5.1. Parameters are k p — 0.1, k t = 1, fc<j = 1, kd m — 0.01, fc& = 0.1, k u = 0.1, and initial conditions are 
P(0) = M(0) = G o //(0) = 0, G on (0) = 1. Note that both the stochastic and the hybrid system show a 
multi-modal behaviour. 



gene_on = [* — > Af = M + 1]^ . gene_on 

+ [*^P' = P-l] fe(;7 P -ip.gene_off 

gene_of f = [* — > P' = P + 1]^ .gene_on 

trsuislate = [* — > P' = P + l]yk t M ■ translate 

degrade = [* — > M' = M — l]k dm M .degrade 

+ [* -> P' = P - l] kdpP . degrade 

Inspecting the previous model, we can see that it is not flat. To convert it into a flat model, we need to 
add two additional variables, G on and G ff, with domain {0, 1}, encoding the state of the gene agent. The 
structure of the gene agent itself reveals a conservation pattern in the system, namely that G on + G f / = 1, 
as they are indicator variables of the state of the gene. Inspecting transitions, we can notice how translation 
has a rate depending on 7, suggesting that M has also to be treated as a discrete variable. On the other 
hand, repression scales as 7 -1 , i.e. it depends on the concentration of P, rather than on the number of 
molecules (repression depends only on the molecules close to the gene, the only ones that can bind to it). 
With this partitioning of variables, we obtain the following normalized TDSHA: 

• Discrete variables are G on , G ff, M, while P is the continuous variable. Q = {0, 1} x {0, 1} x N and 
P has domain [0, 00). 

• Continuous transitions are (*, P' = P + 7 _1 , jktM) and (*, P' = P — 7 -1 , jkdpP); 

• Discrete transitions are (*,G' on = 0,G' ofJ = l,P' = P - 7" 1 , fc 6 PG on ), (*,G' on = l,G' off = 0, P' = 
P + 7- 1 , k u G off ), (*, M' = M + 1, k p G on ), (*, M' = M - 1, k dm M). 

Remark* 5.2. Scaling 3 forbids discrete transitions to have a fast, 0(7) rate. If this would be the case, 
the dynamics of discrete transitions in the limit would be faster and faster, and one would expect that 
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the discrete subsystem affected by these transitions reaches immediately the equilibrium (in a stochastic 
sense). This is what actually happens, under some regularity conditions on fast discrete dynamic, namely the 
possibility of isolating a discrete subsystem affected by fast discrete transitions, which is ergodic (considering 
only fast discrete transitions), and with fast rates depending continuously on continuous variables. In this 
case, one can compute the equilibrium distribution (as a function of other variables) of the fast discrete 
subsystem, remove the fast discrete variables and average the rate functions depending on fast discrete 
variables according to the equilibrium distribution. In case one has only fast discrete variables, the fluid 
limit is given in terms of ODE [5]. This scaling can be integrated quite easily in our framework, using 
the limit theorem of [5] instead of Theorem 4.1 and defining syntactically the averaging at the level of the 
TDSHA, given a method to compute the equilibrium distribution. 

We now turn to discuss the limit behaviour of a model showing hybrid scaling, i.e. with both discrete and 
continuous transitions. We will stick to further simplifying assumptions for the moment: the sCCP program 
has no instantaneous transitions, all stochastic actions are unguarded and have continuous rates, variables 
and transitions have been partitioned into discrete and continuous, discrete transitions have deterministic 
resets and satisfy discrete scaling 3, and continuous transitions satisfy continuous scaling 1. 

We are now ready to state the main result of this section, namely that, under these restrictions, a 
normalized CTMC constructed from a sCCP program converges weakly to the PDMP constructed from 
the normalized TDSHA associated with the sCCP program. 

Theorem 5.1 ([11]). Let (A,Jn) be a sequence of "population- sC CP models for increasing system size 
7jv — > oo, satisfying the conditions of this section, with variables partitioned into discrete X^, continuous 
X c , and environment ones X e . Assume that discrete actions satisfy scaling 3 and continuous actions satisfy 
scaling 2. Let ~X.( N \t) be the sequence of normalized CTMC associated with the sCCP program and x(t) 
be the PDMP associated with the limit normalized TDSHA T{A). 

If Xg W ' x (weakly) and the PDMP is non-Zeno, then X(i) converges weakly to x(t), X =>■ x, as 
random elements in the space of cadlag function with the Skorohod metric. 9 

Proof. We just sketch the proof here. A detailed proof can be found in Appendix D. The main idea is to 
exploit the fact that we can restrict our attention to CTMC and PDMP that do at most to discrete jumps. 
This is sufficient to obtain the weak convergence of the full processes, for two reasons. The first is related to 
the nature of the Skorohod metrics, which discounts the future (i.e. only 1/2 T of the distance comes from 
time instants greater than T), while the second is the non-Zeno nature of the limit PDMP, which implies 
that we can consider no more than to jumps up to time T, with probability 1 — e m , for e m — >• as to — > oo. 

In order to prove weak convergence of it!£\ the CTMC with at most m jumps of discrete transitions, 
to x m , the PDMP with at most to jumps, we can exploit the piecewise deterministic nature of PDMP, 
applying Theorem 4.1 inductively. At the first step, we will prove that the time t[ N ^ of the first stochastic 
jump for converges weakly to T\, the first jump time of x (Lemma D.2 in Appendix D), and also 

the state X^ n \t[ N) ) after time t[ N) converges weakly to x(ti) (Corollary D.l). This shows convergence 
of the processes up to the first stochastic jump. Exploiting this and the strong Markov property, we can 
restart x(i) at time n from x(n) and X(i) from X(r} JV ' ) ) at time t[ N ^ and apply Theorem 4.1 and its 
corollaries again (actually, a minor modification of Theorem 4.1, allowing to sample probabilistically the 
initial conditions of the ODE), to prove weak convergence of the CTMC to the PDMP up to the m-th jump, 
for any to. Note that this argument is based on the continuity of vector fields, rates and resets, which holds 
in our setting as their guards depend only on discrete and environment variables, hence their values do not 
change in each deterministic piece of the PDMP dynamics. □ 

Example. We consider again the simple client server network of Example 2.1, but with a different scaling 
compared to Section 4. In particular, we consider as size 7jv the number of clients, assuming that the 
number of servers remains constant, but with service rate depending linearly on 7jy. In this way, the rate 
of the request transition of client agents is j N min{k r Xr N \ k s Xi} , and it satisfies the continuous scaling. 
Breakdown and repair transitions, on the other hand, will be kept discrete as they modify only the number 
of available servers. As their rate is independent of jn and their reset is constant and also independent of 
N, they both clearly satisfy the discrete scaling. The limit TDSHA that we obtain in this way is shown 

9 See Appendix C for a brcif introduction of these concepts. 
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(b) CTMC trajectory, N = 1000 




(c) t = 10000, PDMP and CTMC cumulative distributions 



Figure 3: Client Server model of Example 2.1, compared with the hybrid limit scaling keeping the number 
of server fixed to 2. Parameters are k r = 2, k s = 0.01, k t — 1/50, fc& = 1/2000, kf = 1/1000, and initial 

conditions are X ( r N) {0) = N, Xi(Q) = 2. Fi gures 3(a) and 3(b) show one trajectory of the PDMP and the 
CTMC for N = 1000, respectively. Figure 3(c), instead, compares the empirical cumulative distribution of 
the PDMP limit and the CTMC, for N = 1000 and N = 10000, at time t = 10000, generated from 2500 
sampled trajectories. 
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in Figure 1. As the hypotheses of Theorem 5.1 are satisfied, the sequence of CTMC models obtained from 
sCCP with the standard stochastic semantics converges (weakly) to the limit TDSHA. 

This can be seen in Figure 3, where we compare a trajectory of the CTMC with a trajectory of the 
PDMP, and the distribution of the number of clients requesting service at time t = 10000. 

Example. We reconsider now the genetic network model of Example 5.1. Also in this case, we can expect 
a bimodal behaviour for the CTMC semantics, due to the gene working as a discrete switch. If the binding 
strength of the repressor is large, meaning that the protein will remain bound to the gene for a long time, 
then the gene will be switched off for long periods, and we may expect to see a bursty behaviour. This 
is indeed the case, as can be seen in Figure 2(b). Moreover, the hybrid limit constructed in Example 
5.1 matches perfectly this behaviour, as can be seen in Figure 2(b). As the model satisfies the (scaling) 
assumptions of Theorem 5.1, we can conclude that this is indeed the consequence of the (weak) convergence 
of the sequence of CTMC models to the hybrid limit. 

5.1 More on random resets 

The scaling condition 3 requires us to check a convergence condition on resets that seems quite complicated 
at first glance, as it involves checking weak convergence of reset kernels for any possible convergent sequence 
of states. We chose this condition because it is very general and it interfaces smoothly with the inductive 
proof technique that we use in the paper. However, in the following, we will briefly discuss several simpler 
conditions that can be checked more easily, and that should cover most practical cases. 

We first start by observing that we can split the convergence condition in two parts, i.e. we can check 
that r^(x,w) — )■ fjr(x,w) uniformly in x and w and that W^^xW) => W^x) (weakly), for any 



for any compact set K C E and any uniformly continuous function g : E — > M and that J g(y)P{Wjr(x) = 
y} dy is a continuous function [40], which may be sometimes easier to check. Moreover, in practice we can 
expect wi-^ and W„ to have a simple structure, which should facilitate the task of verifying the scaling 
condition. 

First of all, if and W„- do not depend on x, then the condition reduces to =>■ W„ which 

can be checked by showing one of the equivalent conditions of the Portmanteau theorem [9]. In particular, 
the condition is trivially true if wl W ' docs not depend on N, i.e. if = W,. 

We consider now two examples, to illustrate the use of random resets and the hybrid convergence in this 
case. 

Example 5.2. We consider a small variation of the client server model of Example 2.1. The difference is that 
we will assume different levels of severity of a breakdown, so that the repair time can be variable, depending 
on this level. For simplicity, we assume a single server, but a generalization to more than one server is 
straightforward. In order to model this situation in sCCP, we can either increase the number of internal 
states of the server (one for each level of damage) or use an additional (discrete) variable. We chose this 
second approach, introducing D, the damage-level variable. We assume that D takes values on the integers, 
and that each time a breakdown happens, its value is sampled from a geometric distribution with parameter 
0.5, so that we have a probability l/2 fe to see a damage of level k. We therefore let W ~ Geora(0.5), so 
that ¥{W = k} = (1 — 0.5)' c_1 • 0.5. We further assume that the repair time is proportional to the damage 
level, so that the rate of repair is kf/D. We therefore obtain the following sCCP code, where variables 
X r ,X t ,Xi,Xb are as in Example 2.1: 




client = [* — > X' r = X r — I A X' t = X t + l] mi » {ir x r , 7J ,fc,x,} • client + 

[* -> X' r = X r + 1 A X' t = X t - l] kt x t ■ client 
server = [* ^ X[ = X, - I A X' b = X b + 1, D' = W] kbXi ■ server 
+ [* -> X[ = X t + 1 A X' b = X b - l} kf /D-x b ■ server 
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Figure 4: Empirical cumulative distribution of clients requesting service at time t = 10000 and average 
number of clients request service for the client server models of Example 5.2. The top row shows the model 
with severity level of breakdowns samples according to a geometric distribution with probability p = 0.5. 
Parameters are as in caption of Figure 3, a part from fc/ = 1/200. The bottom row shows the model with fix 
rate lognormally distributed with mean -2.5 and standard deviation 1.0. Note that both histograms present 
a similar pattern. The bimodality of the distribution is captured for the geometric breakdown. Moreover, 
the hybrid model has less variability in the distribution (it has a sharper cumulative distribution function) . 
The averages are almost indistinguishable. 
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In this case, we clearly have that Xi, X b , and D are discrete variables, while X r and X t can be approximated 
continuously. D can also be seen as an environment variable, as it is used to modify a parameter of the model. 
Therefore, the transitions of the client agent become continuous transitions in the associated TDSHA, while 
the transitions of the server agent remain discrete and stochastic. Note that we made explicit the dependence 
on size in the rate functions. Clearly, all transitions satisfy the scalings of Theorem 5.1. This is true also 
for the breakdown transition, as W docs not depend on the current state of the system. It follows that 
Theorem 5.1 applies also to this example, see also Figures 4(a) and 4(b). 

A variation of this model is to replace the finite damage levels with a continuous level of damage, 
essentially sampling the repair rate from a continuous distribution. This can be done in sCCP by using a 
real- valued environment variable, call it K. For simplicity, here we assume that the fixing rate is sampled 
from a lognormal distribution with mean /j, and standard deviation a. We can obtain this variant of the 
model by replacing the server agent with the following one: 

server = [* X[ = Xi - 1 A X' b = X b + 1, K' = W] kbXl ■ server 
+ [* -> X[ = Xi + 1 A X' b = X b - l] K . Xb . server 

Also in this case, the hypotheses of Theorem 5.1 are satisfied, and convergence to the hybrid limit works 
(see Figures 4(c) and 4(d)). 

We turn now to discuss convergence of ~W^\x) to W^x) when they depend on x. The situation is 
more delicate, as convergence has to be uniform. In the following, however, we list some sufficient conditions 
to guarantee convergence, that are of practical relevance. 

1. wi N \x) = W 7r (x) and W 7r (x) depends continuously on x; 

2. Wf'fx) and W T (x) are discrete distributions with mass concentrated on points {yi, . . . , y^, . . .}, 
and P{wl^(x) = yfc} converges to P{W T (x) = y^} uniformly in any compact K C E; 

3. W^^x) and W(x) arc unidimensional real random variables, with cumulative distribution functions 
F^ N \y,x) and F(y,x), such that, for each xW — > x, F^ N \y,x^ N>> ) — > F(y, x) pointwise for any 
continuity point y of F(y,x). 

4. WW(x) and W(x) have values in R h , and they have continuous density functions g( N ^ (y, x), g(y,x), 
and sup ygRfcj:j;;eA - \\g( N \y,x) — c/(y,x)|| — > 0, for each compact set K C E. 

5. WW(x) and W(x) can be decomposed into the product of marginal and conditional distributions 
that converge in the sense of Scaling 3, i.e. W(x) = W il (x)W i2 (x,w il ) 

W, x.w ) , WW (x) = Wf> (x)wtf» (x, w n ) • • • wj?> (x, Wil , . . . , 

x,w), as x^) — > x and — > w. 

6. W^(x) and W(x) arc mixtures of distributions W^(x) and Wj(x) of one of the previous types, 
i.e. WW(x) = EjPj^WWj^W and W(x) = E J ^(x)W j (x). 

It is straightforward to show that each of these conditions implies that W^^x^) => W(x) as xW — > x, 
hence they can be used whenever it is more appropriate. 

Example 5.3. We consider again the client-server model of Example 2.1, but modify it by including the 
spread of a worm epidemic. We consider a situation in which a worm has spread on the network and 
activates on a specific date, sending all infected clients into a non-working state, called Xd, from which they 
need some time to recover. We abstract from the epidemic spreading and model the effect of the epidemics 
as an event that affects synchronously all clients and infects each of them with probability p. Let Wi(X), 
i = 1, 2, be binomial distributions with success probability p and size given by X. For simplicity, we ignore 
the breakdown and repair of servers, so that we need four variables, X r , X t , Xd, and Xi, and initial network 
client || worm, where client is as in Example 2.1, while worm is given by the following code: 

worm d = [* -> X' r = X r - Wi{X r ) A X' t = X t - W 2 {X t ) A X' d — Xd + Wi{X r ) + W 2 (X t )] kw .worm 
+ [* -> X' d = X d - 1 A X' r = X r + l]k d -x d -worm 
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(a) Example 5.3, clients requesting at time t = 10000 



(b) Example 5.3, average 



Figure 5: Comparison of the empirical cumulative distribution of clients requesting service at t = 10000 
(left) and average (right) of the limit PDMP and the CTMC at system size 1000, for the client-server model 
with worm infection of Example 5.3. There are 2 servers and — N clients. Parameters are k s = 0.0l7Ar, 
k r = 2, k t = 1/40, k w = 1/2000, Wd = 0.1, and the infection probability is p = 0.33. 



In the limit TDSHA model, the infection action remains discrete and stochastic, while all others are 
approximated continuously (including the recovery). Here, the system size is clearly the number of clients 
(server dynamics are ignored, so the number of servers can be seen as a parameter). When looking at the 
normalized model for system size = N, then the reset of the infection transition, say for what concerns 
clients thinking, is x t — jfW([Nx t \ ), which can be also written as x t — ^ N ^^ ^ s . t ^ W([Nx t \), provided 
L/Vx t J > 0. By the law of large numbers, this expression converges to x t — px t , so this should be the reset 
of the limit PDMP. However, to apply the limit results of this section to this model, we have to prove that 
jrW(\_Nx( N ^ J) — > xp for any x^ N ' — > x and then apply point 3 of proposition above. To show this, observe 
that if x > 0, then > x/2 ultimately, hence [Nx^\ -!• 00, so that -jj^jW([Nxt\) p. When x = 0, 

instead, observe that j^W ([N x^ \) < ^ Nx ^ - — > 0, which shows the desired convergence. 

Remark* 5.3. The framework of population-sCCP programs forces the modeller to explicitly consider the 
notion of system size and to incorporate it in the rate functions. This requirement greatly simplifies the 
manual verification of the scaling conditions, at least for what concerns rate functions. 

There are three kinds of conditions to check: convergence of rate functions, regularity of rate functions 
(local Lipschitzness), and convergence of reset kernels (or of increments). 

Most of the time, these checks are easy to carry out: rates are often density dependent and differentiable 
and resets are constant increment updates. If rates depend on 77V, usually this dependence is simple and 
verifying convergence poses no challenges. For instance, in a biochemical system, the (normalized) mass 
action rate when two molecules of the same kind react together has the form kx(x ~ ^~), which is easily 
seen to converge uniformly in any compact set (i.e., whenever x is bounded). As for the regularity of rates, 
most of the time we will deal with functions constructed by algebraic operations, plus some other function 
like the exponential or the logarithm. All these functions are analytic [41], hence locally Lipschitz. Also 
the use of minimum or maximum preserves this property. What can be more challenging is the case in 
which resets have a stochastic part depending on the current state of the model. However, the conditions 
discussed in this section should cover most of the practical cases. Indeed, we can expect in most models the 
use within resets of simple discrete or continuous distributions, like Gaussian or uniform ones. 

What is undoubtedly more challenging is to make this check automatic. This is partly due to the 
generality of sCCP as a modelling language, which allows a user to express very complex rates and updates. 
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Hence, a malicious user can construct models that are very complicated to check. However, in most practical 
cases it may be possible to set up automatic routines that verify the scaling, by clever use of computer algebra 
systems. 

Another alternative is to identify a library of functions (for both rates and resets) which are guaranteed 
to satisfy the regularity and scaling conditions. This is what happens in the process algebra PEPA [60], 
where the syntactic- derived restrictions on the possible set of rate functions and updates guarantee that the 
conditions of the fluid approximation theorem (Theorem 4.1) are always satisfied. Constructing a library 
of "good" functions restricts the expressive power of the language, but should be enough to cover most 
practical modelling activity. Furthermore, libraries can be extended when needed, and the user can also 
use additional functions, if she also provides a "certificate of correctness". We will pursue this line of 
investigation in the future, with the implementation of the framework in mind. 

6 Dealing with instantaneous transitions 

In this section we discuss convergence to the hybrid limit in presence of instantaneous events. These events 
remain discrete also in the limit process and can introduce a discontinuity in the dynamics that is triggered 
as soon as their guard becomes true. The class of limit PDMP obtained in this way is more difficult to 
deal with than PDMP with just stochastic jumps. In fact, we cannot rely any more on the "smoothing" 
action in time of a continuous probability distribution like the exponential, but we need to track precisely 
the times at which instantaneous events happen. In particular, there can be time instants in which we can 
observe a jump in the limit process with probability greater than zero. This is particularly the case when 
the hybrid limit is a deterministic process, i.e. a process without discrete stochastic transitions and random 
resets. 

From the point of view of weak convergence, dealing with instantaneous transitions requires us to prove 
that their firing times in the sequence of CTMC models converge to the firing time in the hybrid limit 
model. Furthermore, we need prove also convergence of the state after the reset. As we will see, both 
properties are not guaranteed to always hold. The problem resides in the intrinsic discontinuous nature of 
the exit times and resets on the activation region of guards. Thus, to prove convergence, we need to impose 
further regularity conditions on the PDMP, forcing its dynamics to avoid these discontinuous regions (with 
probability 1). We will start by discussing the issues with the exit time, then turn to reset kernels, and 
finally move to the limit theorem. After having discussed examples, we will consider a small extension of 
sCCP, allowing guards to depend on (simulation) time, and discuss limit theorems for this extended class 
of models. 

Convergence of Exit Times 

Convergence of exit times docs not hold in general. Focussing on a deterministic trait of the PDMP 
dynamics, the problem is created by trajectories of the vector field that activate a guard by touching 
tangentially its boundary surface, as shown in Figure 6(a). In fact, for N large enough trajectories of the 
CTMC are contained in a small flow tube around this solution, hence some of them can cross the surface, 
while others may miss it. Another class of trajectories that creates problems is that of trajectories remaining 
in the boundary of the activation region of the guard (i.e., the discontinuity surface of the guard predicate) 
for a non-negligible amount of time, say for the time interval [ii,^]- Here, the problem is that a CTMC 
trajectory can activate the guard in any time instant between t\ and ti. 

However, convergence holds for trajectories of the vector field which transversally cross the discontinuity 
surface of a guard predicate, meaning that they intersect the surface at time t and enter in the interior of 
the region in which the guard is true just after t. Fortunately, this is the situation we are more likely to 
find in practice. 

Now we prove a result about convergence of exit times that can be applied to the setting of Section 4. 
This result requires that almost surely only transversal crossings occur. We will then extend the hybrid 
limit theorem imposing this condition. We postpone discussion about how to check and/or enforce such a 
condition until later in the section. 
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We need some preliminary definitions. The first logical step that we need is to move from predicates to 
continuous functions in the definition of a guard. 

Definition 6.1. Let g(x) be a guard predicate with closed activation region. A function h : E — > K is 
an activation function or a guard function for g if it is a continuous function, and if the sets of points 
{x | h(x) > 0} defines the activation region of g: g(x) is true if and only if h(x) > 0. 

The function h is a robust activation function for g if and only if <9{x | /i(x) < 0} = d{x | h(x) > 0} = 
{x | h(x) = 0}. 

The notion of robust activation function essentially guarantees that the interior of the set in which g is 
true is {x | h(x) > 0}, so that it makes sense to define a transversal crossing of the guard g as a change of 
sign of the function h. The discontinuity surface of the guard is therefore H = {x | h(x) = 0}. 

The notion of robust activation is a very reasonable assumption to make. In practical cases, the function 
h will be a piecewise smooth function, usually piecewise linear, and the surface % will be a union of 
diffcrentiable (or even analytic) manifolds of dimension n-1 or less. In particular, a function like h{x) = x\ 
if X\ < and h(x) = if x\ > is forbidden. This function is a bad activation function as its discontinuity 
surface % is the whole half-space {x\ > 0}, and furthermore the sequence of functions h( N \x) = /i(x) — 
converges uniformly to h but their activation set is empty for all N. This justifies the use of the term 
robust: we are forbidding functions which under any small perturbation induce a discontinuous change in 
the activation set. From now on, we restrict our attention to robust activation functions. 

Coming back to PDMP derived from TDSHA, it is easy to see how to construct a guard function for the 
class of guards of instantaneous transitions. In fact, we are considering only positive boolean combinations 
of atoms of the form /ij(x) > 0, where hi is continuous. Then, we just need to combine the functions hi 
with maximum and minimum to take into account the structure of boolean combinators. Furthermore, if 
in the TDSHA we have k instantaneous transitions m,...,irk, with activation functions hi, . . . , hk, we can 
combine them into a unique activation function h by taking their maximum: h(x) = max{/ii(x), . . . , hk(x)}. 
The function is a robust activation function which is greater than or equal to zero if and only if at least one 
guard is true. We call h the activation function of the PDMP. 

Consider now a continuous trajectory x : [0, oo) — > E and let h : E — »■ R be a robust activation function, 
such that /i(x(0)) < 0. 

Definition 6.2. The robust activation function h (or the corresponding activation surface H) is transversal 
for the trajectory x(i) if and only if, letting ( — inf{t | h(x(tj) > 0}, there is a 5 > such that h(t) > for 
te(C,( + 5]. 

Suppose now X(t) is a stochastic process with almost surely continuous trajectories, like the fluid limit 
x(t), with initial conditions (drawn from a distribution) x . 

Definition 6.3. An activation function h is robustly transversal to X(i) if and only if the set of trajectories 
for which it is transversal has probability 1. 

The notion of robustly transversal activation function can be lifted to PDMP, by requiring that all 
the guards of the PDMP are robustly transversal in each continuous trait of the dynamics, 10 i.e. that 
instantaneous transitions are activated transversally: 

Definition 6.4. A PDMP x(t) is robustly transversal if and only if with probability one its trajectories 
are robustly transversal in each continuous trait. 

Consider now a sequence (t) of normalized CTMC associated with a sequence (A, 7at) of population- 
sCCP models, and assume that X^(t) converges weakly to X(£) as N — > oo, where X is a.s. continuous. 
Furthermore, let h, h^ : E — > E be the activation functions for X and X.( N \t), respectively. Assume that 
/iW converges to h uniformly for each compact set K C E and call C (JV) = inf{i | h^(± {N \t)) > 0}. 

We can show the following lemma, whose proof is given in Appendix D. 

10 This means that, if the i-th jump of a PDMP trajectory x(t), happening at time Tj, corresponds to an instantaneous 
transition, then there is a <5 > such that by extending the continuous trajectory starting at x(T^ 1 = up to Tj + <5, the 
crossing is transversal, i.e. h(x(t)) > for t 6 (Tj,Tj + 8). 
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(a) xi trajectory (b) P{.2~W(t) = 1} 

Figure 6: Left: limit trajectory of Xi{t) for Example 6.1. Right: F{Z( N '(t) = 1} as a function of i, for 
different values of N . Bi-modality of the distribution around t = 1 and t = 4 is manifest. 

Lemma 6.1. Let (A, Jjsr) be a sequence of population- sC CP models for increasing population size 7at — > oo, 
as N — > oo. Let X^'(t) 6e the associated normalized sequence of CTMC, and suppose "XS N > X, where X 
/ias a.s. continuous sample paths. Let h^ N ' , h be activation functions for X( w ) and X, swc/i t/iai h^ N ' — > h 
uniformly, and suppose h is robustly transversal to X. Then =>■ □ 

Example 6.1. We discuss now a hand-crafted example to demonstrate the need for the request of transver- 
sal activation of a guard. We consider a population-sCCP program (A, jn) with three continuous variables 
Xi, X2, and X3 taking values in Z, and one discrete variable Z. 

agent 1 = [X 2 > X' x = X x + l]x 2 ■ agent 1 

+ [X 2 <0^-X(=X 1 -l]| X2 |.agentl 

agent2 = [* — > X' 2 = X 2 + l]x 3 ■ agent2 

+ [* -> X' 2 = X 2 - 1] 127JV . agent2 

agent3 = [* — > X^ = X3 + l^jy ■ agent3 

doom = [X\ > 5jn Z = l]oo:l -0 

The initial network is agentl || agent2 || agent3 || doom, with initial value of variables ^i(O) = 7^, 
X 2 (0) = 9 1N ,X 3 {0) = Z(0)=Q. 

Normalizing the model, we observe that all non-instantaneous transitions satisfy the continuous scaling. 
If we compute the drift, we obtain the following set of ODEs (as the guards in the transitions of agentl 
elicit with the modulus), with initial conditions (1, 9, 0): 




These equations can be integrated directly, obtaining x\ (t) = t 3 — 6t 2 + 9t + 1, whose trajectory can be seen 
in Figure 6(a). Notice that, for t = 1, the curve hits tangentially the line x\ = 5, which is the activation 
surface associated with the robust activation function /i(x) = x± — 5, while it transversally crosses such a 
line at t = 4. In Figure 6(b), we show the hitting time distribution for the sequence of CTMC for increasing 
size 7jv, by visualizing the passage-time distribution of the event Z = 1, i.e. ¥{Z(t) — 1} as a function of t. 
As we can see, the bimodal nature of the distribution persists also for large N, supporting the claim that 
tangential activation creates problems for convergence of exit times. 
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Convergence of Reset Kernels 

There is a second source of discontinuity induced by instantaneous transitions, namely in the reset kernel of 
the PDMP on the activation surface of the guards. The problem lies in the fact that, if we have more than 
one instantaneous transition, a specific one will be active only in a subregion T-L^ of the activation surface 
T-L, where % = {x \ h(k) = 0} and = {x € H | /i„-(x) = 0}. In particular, the reset kernel is not robust 
in the boundary &u7i v of in H. In fact, if a trajectory of the PDMP hits H in such a boundary, then 
a small perturbation can change the set of active guards (including or excluding n), and the fate of the 
system may be different. The same problem can manifest itself on the intersection between the activation 
surfaces of the guards of two instantaneous transitions 7Ti and ir 2 : in any neighbourhood of (the boundary 
of) this region, we can find points in which only one of m and TT2 is active. This lack of robustness reflects 
itself in a loss of continuity of the reset kernel. Hence we can no longer rely on this property to prove the 
convergence of the state after the reset (a fact used in the proof of Theorem 5.1). 

Intuitively, convergence cannot hold for trajectories x(t) of the PDMP that hit H in d-uT~Ln- In fact, 
trajectories of the CTMC that converge to x(t) can hit either or is complement in H, implying that 
the CTMC can be reset to a different state from the PDMP. Furthermore, the probability of hitting 
or its complement in H will depend on the geometry of % around the boundary d^H^, rather than on 
the priority functions governing the choice for the PDMP. We illustrate this point by the following simple 
example. 

Example 6.2. We consider a model of a one-dimensional random walk in sCCP. More specifically, we 
consider a population-sCCP program (.4, 7jv) with two variables to be approximated continuously, X and 
Y, and one variable Z to be kept discrete. In particular, X and Y will count how many times we go up and 
down, respectively. We have the following sCCP code: 

up = = X + l] 7JV .up 

down = [* — > Y' = Y + 1] 7JV .down 

dooml = [X > 7 W -> Z = l]oo:99 . 

doom2 = [Y > 7w -> Z = -l]^ . 

The initial network is up || down || dooml || doom2, with initial value of variables X(0) = Y(0) = Z(0) = 0. 
This system is easily seen a to be a one-dimensional random walk for the variable W = X — Y. When 
visualized in the plane X, Y, the trajectories of the random walk are rotated by 45 degrees along the 
line defined by the equation Y = X (see Figure 7(a)). In the normalized system, such trajectories will 
eventually hit one of the discontinuity surfaces at x = 1 or y = 1. The vector field of the PDMP is given 
by F(x,y) = (1, 1), hence the solution from the point x(0) = y(0) = is x(t) = y(t) = t, which corresponds 
to the line x — y. This line hits the activation surface in its corner point (1, 1), where both transitions are 
active, hence after the reset Z = 1 with probability 0.99, and Z = — 1 with probability 0.01. However, in 
the CTMC X and Y can only increment by l/N asynchronously, meaning that each trajectory has to hit 
one of the two segments of the activation surface before the other. By a simple symmetry argument, we 
can see that for each N, the probability that Z — 1 after the reset is 0.5 (see again Figure 7(a)), hence 
convergence cannot hold for this model. 

To have some hope to obtain convergence after a reset, we need to exclude trajectories of the PDMP 
x(t) that are troublesome. Consider a sCCP model with m instantaneous transitions 7ri,...,7r TO , and 
let km-, • • • , h Wm be the corresponding activation functions, and h = maxj/i,^, . . . , h nm } be the activation 
function of the PDMP. Define the activation surface H = {x | /i(x) = 0} and H Vi = {x £ % \ h 7rj (x) = 0}. 
Let Dj — dfi'H 7 , j be the boundary of V. 7Tj in % and D — \J. Dj be the union of such boundaries, called the 
discontinuity region of H ■ 

Definition 6.5. A PDMP x(t) obtained from a TDSHA T has the robust activation property if and only 
if the set of trajectories hitting the discontinuity region D of % has probability zero. 

This property essentially tells us that we can ignore the situations in which the PDMP activates instan- 
taneous transitions in non-robust points. 

However, this is not the only issue with reset kernels. There is another problem that can arise when 
we allow the activation functions h„ of instantaneous transitions to depend on the size 7jv, i.e. when 
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(a) Example 6.2 (b) Example 6.3 

Figure 7: (left) Exemplification of the random walk model of Example 6.2. The activation surface of 
instantaneous transition dooml is shown in red, while the activation surface for doom2 is visualized in blue. 
Trajectories are coloured according to the surface they hit. The black trajectory is the solution of the 
PDMP associated with the model, (right) Exemplification of the random walk model of Example 6.3. The 
trajectories coloured in red fire the doom2 instantaneous transition, whose activation surface is also shown 
in red. The dotted blue line is the activation surface of dooml. Obviously, no trajectory is coloured in blue, 
as no trajectory can hit that surface. 
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hi: 7^ h n . The problem, in particular, manifests itself if the activation surfaces of two or more guards with 
size-dependent activation functions overlap (robustly) in the limit model. 

Example 6.3. We consider a simple random walk model, with one variable X with values in Z, that will 
be approximated continuously, and a variable Z that will remain discrete. 

rw = [* X' = X + 1] 7N .rw 

+ [*^X' = X-l] 7N .rw 

dooml = [Z = AX > j N k - 1 ->• Z = lJ^i.O 

doom2 = [Z = AX > j N k - 2 -> Z = -lJ^i.O 

The initial network is rw || dooml || doom2, with initial value of variables X(0) = Z(0) = 0. The activation 
surface for Z = of dooml in the normalized model is the hyperplane X = k — while that of doom2 is 

the hyperplane X = k — . As X is increased and decreased by one unit only (hence X is modified by 
units), for any N the system will always fire doom2 (notice that the additional condition on Z forbids firing 
dooml once doom2 has fired). Hence, Z = — 1 eventually, for the CTMC models at any population level N 
(see also Figure 7(b), for k = 0.5). However, in the limit model both activation surfaces converge to the 
limit hyperplane x — k, hence in the limit PDMP Z takes value -1 only with probability 0.5. Convergence 
again fails. 

This example suggest that, in order to avoid such problems, we should either forbid A-dependent guards 
in instantaneous transitions of population-sCCP models, or try to forbid those situations in which more 
than one TV-dependent guard can be robustly activated at the same time in the limit model. We state this 
in the following definition. 

Definition 6.6. A set of activation functions of guards h[ N ^ , . . . , hffl of a population-sCCP model is 
size- compatible if and only if, for each j such that is size-dependent (i.e. hj N ^ converges uniformly to 

hj in each compact set but ^ hj), then int-u(Hj) C\Ui = 0, for each i ^ j (i.e. in any point in which 
the limit activation function hj is robustly zero in "H, no other /i, function is zero). 

The limit PDMP x(i) obtained from a population-sCCP model is size- compatible if and only if the set 
of activation function of guards of instantaneous transitions is size compatible. 

Technically, Definitions 6.5 and 6.6 arc the key properties that allow us to extend a lemma on the 
convergence of continuous reset kernels (Lemma C.l), into a more general result capable of dealing with 
discontinuous reset kernels of the form induced by instantaneous transitions. This will be formally discussed 
in Lemma D.3 in Appendix D. 

Hybrid Convergence Theorem 

The previous lemmas and hypothesis are the core argument for extending Theorem 5.1 in the presence of 
instantaneous transitions. Before proving it, we make explicit the scaling for instantaneous transitions. 

Scaling 4 (Discrete Scaling for Instantaneous Transitions). A normalized instantaneous sCCP transition 
with random reset tt = ( ff ( JV )(X),X' = f W (X, (X)), pi N) (X)) of a population-sCCP program 

(.4, 7jv) with variables partitioned into X = (X d ,X c ,X e ), with X £ E, satisfies the discrete scaling if 
and only if: 

1. The activation function ft/ w )(X) of the guard g( N \X.) converge uniformly in each compact K C E to 
a continuous function /i(X); 

2- pi^(X) = 0(1), pi^(X) is continuous and it converges uniformly in each compact K C E to the 
continuous function p,r(X); 

3. Resets converge weakly (uniformly on compacts), i.e. for each — > x in E, r (N ) (xW, W^^xW) =>■ 
f (x, W(x)), as random elements on E. 
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If an instantaneous sC CP-transition of the form it = [/i^(X) > 
— > X' = r^(X, W^^X))]^ .p(N)r%\ satisfies the previous scaling, then the corresponding transition in 
the limit TDSHA is given by (K(x)) > 0, r^x, W(x)), p w (x)). Consider now the limit PDMP x on E, 
associated with the normalized TDSHA 7~(A) constructed from a sequence (.4, 7jv) of population-sCCP 
models, in which all transitions satisfy Scalings 2, 3, or 4. 

Theorem 6.1. Let (A,jn) be a sequence of population- sC CP models for increasing system size jn — > oo, 
as N — > oo, with variables partitioned into X = (X<j,X c ,X e ), with discrete stochastic actions satisfying 
scaling 3, instantaneous actions satisfying scaling 4, and continuous actions satisfying scaling 2. Let X.( N \t) 
be the associated sequence of normalized CTMC and x(t) be the limit PDMP associated with the normalized 
limit TDSHA f(A). 

If x { N) x (weakly) and the PDMP is non-Zeno, robustly transversal, has the robust activation 
property and it is size-compatible, then X.( N \t) converges weakly to x(i), X( w ' =>■ x, as random elements 
in the space of cadlag function with the Skorohod metric. 

Proof. The proof is only sketched here, see Appendix D for further details. The idea is to reason as in 
Theorem 5.1, just replacing the machinery about jump times by a more refined one taking into account 
also instantaneous jumps. Essentially, we have to take the minimum of the stochastic and instantaneous 
jump times, and choose which reset kernel to use according to which kind of event (stochastic or instanta- 
neous) fires first. The weak convergence of these new jump times and reset kernels follows easily from the 
convergence of stochastic and instantaneous ones. □ 

Remark* 6.1. Theorem 6.1 relies on three global properties of the PDMP associated with the population- 
sCCP model, namely the robust transversal, the robust activation, and the size-compatibility property. 

The last requirement should be generally relatively easy to check, as it depends only on the activation 
functions of guards, and not on their interaction with the vector field. In fact, in most practical cases, 
guards are boolean combinations of linear predicates, hence if some of them depend on N, by computing 
the limit activation function (which should also be a combination of linear functions hj .i, . . . , hj^-), one 
can discover if there is a robust overlapping of guards by solving a linear system of equations for each pair 
i, j of size-dependent guards (say /ii j i(x) = 0, . . . , /^/^(x) = 0, hj t i(x) — 0, . . . , hj^, (x) = 0) and checking 
if the solution has dimension n — 2 or less in each mode. Here we are implicitly assuming that the PDMP 
and the sequence of CTMC can evolve in an open subset of E of dimension n in each mode (if this is not 
the case, due to conservation laws, we can just use these laws to reduce the dimensionality of the system). 
If the activation functions are non-linear, then the previous approach can be still carried out, but checking 
the intrinsic dimensionality of a non-linear manifold is obviously more complicated. 

On the other hand, the robustness conditions on the PDMP are more complex to check. The robust 
transversal property requires that the PDMP transversally crosses an activation surface with probability 
one. If the PDMP is deterministic (i.e., there is no discrete and stochastic transition), then this check can be 
carried out along the single trajectory starting from the given initial state. In case the number of firings of 
instantaneous transitions is finite, or these events are ultimately periodic, then it may be possible to set up 
a semi-decision procedure for this task. The problem, also in this simple case, is that checking if a trajectory 
has a tangential crossing is the same as looking for a non-simple zero 11 of the activation function. However, 
no root finding algorithm is able to properly deal with non-simple zeros, even for analytic functions, see e.g. 
[59]. In fact, we can only hope to compute a non-deterministic approximation of the trajectory, namely a 
flow tube around it, which for a tangential activation would intersect the surface but not cross it completely. 
This still does not prove that there is a tangential zero, just that we cannot ignore this possibility. Note 
that if a trajectory does not intersect the activation surface but a flow tube of small radius around it does, 
then the behaviour of the sequence of CTMC can diverge from that of the PDMP due to small fluctuations 
around the limit trajectory, which can lead to completely different behaviours. In those cases, even if 
convergence will hold in the limit, the speed of convergence can be very slow. 

If the PDMP is a proper stochastic process, then checking the robust transversal property can be even 
more challenging. In fact, the condition requires us to show that non-transversal activations happen with 

11 A zero of a real valued differentiable function is non-simple if also the derivatives of the function up to order k > 1 arc 
zero in the same point. 
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probability zero. One way to approach the problem is to exploit randomness to our advantage. Suppose 
that, in a given mode q, the continuous state space E q has topological dimension n and that we can show 
that the activation surfaces have (topological) dimension n— 1 and set of points B in the activation manifold 
corresponding to non-transversal crossing has (topological) dimension n — 2 or less. Then, the set of points 
E t of E q such that x(i) € B if x € E t has dimension n — 2 (it is the continuous image of B under the flow 
of the vector field for — t units of time) so that the subset Eb C E q of initial points for which x(i) hits B 
has dimension n — 1. If we can further prove that the distribution at each time t of the PDMP is absolutely 
continuous with respect to the Lebesgue measure (i.e. P(A) = for each Borel set A of Lebesgue measure 
0), it necessarily follows that the probability that x(t) € B is zero (as Eb has Lebesgue measure zero). This 
last property can be enforced by requiring that the initial conditions and the reset kernels are absolutely 
continuous probability distributions (e.g. n-dimensional multivariate Gaussian distributions). If the system 
satisfies some conservation law, so that we are interested in its dynamics in a manifold of dimension less 
than n, then we can reduce its dimensionality and analyse the reduced system in the way sketched above. 

Proving that the set B has dimension n — 1 or less, instead, is more challenging in general. If the 
activation function of guards are linear (or analytic), and the vector field is analytic, then one may exploit 
properties of analytic manifolds for this task, studying the set of zeros of the scalar product of the normal 
vector to the activation surface with the vector field (B, in fact, is contained in this zero set). We do 
not pursue this direction any further in this paper, leaving its investigation for future work, with the 
goal of providing (semi-)automatic static analysis procedures to check for the applicability of the hybrid 
approximation method, at least for a practically relevant subclass of population-sCCP models. 

The property of robust activation can be dealt with along the lines sketched above, looking at the 
dimension of the intersection of activation surfaces. In this case, the task should be considerably simplified 
if all guards are linear. 

Example 6.4. We consider now a different scenario, in which we model the spreading of a worm epidemic 
in a computer network. The class of models used for this circumstance is usually drawn from the well 
developed field of epidemiology, and we make no exception to this rule. We will consider a simple SIR 
model [2] , in which each node of the network has three states: susceptible X s , infected Xi and recovered 
X r . Here the size of the system 7^ coincides with the total population TV of nodes, which is assumed to be 
constant, i.e. Xi + X s + X r = N. We assume that infection happens by the malicious action of the worm 
in infected nodes, which try to send infected messages around the network. There is also a small chance 
that infection comes externally from the network. Recovery from an infection is obtained by patching an 
infected computer node. However, after some time new generations of worms appear, and we describe this 
by the loss of immunity of recovered nodes, that return to be susceptible. We assume that only infected 
nodes are patched. The sCCP code for this model is as follows: 

infection = [* — > X[ = Xj + 1 A X' s = X s — l]kiX s Xi/^ N ■ infection + 

[* ->• = Xi + 1 A X' s = X s - l] 7jv fc e . infection 
loss_immunity = [* — > X' r = X r — 1 A X' s = X s + l]k s x r ■ loss_immunity 
patching = [* ->• X[ = Xi - 1 A X' r = X r + l] fep X; .patching 

The patching rate k p is the only controllable activity in the system, and we will use instantaneous transitions 
to model control policies. In particular, we consider here the following policy: if the fraction of infected 
computers is above a threshold ai, we increase the patch rate from to kp. If the fraction of infected fall 
below the threshold ao < cti, we switch back to the normal patching rate. We model this in sCCP by 
introducing a new variable U, taking values or 1, modifying the agent patching as 

patching = [U = X[ = Xi - 1 A X' r = X r + l] fe o x . .patching 
+ [U = 1 X[ = Xi - 1 A X' r = X r + 1] fe i^ . patching 

and introducing the control agent 

control = [Xi/jN > ol\ — > U' = l]oo:i • control 
+ [Xi/jN < ao -> U' = Ojooii . control 
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Figure 8: Comparison of hybrid and stochastic trajectories of the epidemics model of Example 6.4, for 
parameters h = 100, k e = 0.001, k s = 0.1, k° = 0.1, fc* = 2.0 (left) or k\ = 1.0 (right). Control thresholds 
are ao = 0.3 and oi\ = 0.1 (left) or a\ = 0.088 (right). Initial conditions are aij(0) = 0.1 and x s (0) = 0.9. 
Note that for N large the stochastic trajectory is indistinguishable from the hybrid limit one. In the figure 
on the right, the threshold a\ is slightly smaller than the steady state of the ODE when {7=1. However, 
the stochastic system can hit the threshold and change mode, even for 7^ large. In any fixed time horizon, 
this event becomes less and less likely as N goes to infinity: here, we observe a spike for TV = 100000, but 
not for N equal to one million. 



First, note that this model satisfies the scaling assumptions of Theorem 6.1, when all variables except U 
and all stochastic sCCP-transitions are considered as continuous. As there is no stochastic transition, the 
limit model is deterministic. Hence, if we start from a given initial state, we need to check that the single 
trajectory of the limit model satisfies the assumptions. For a given set of parameters, shown in Figure 8, 
we can choose ot\ such that the steady state of the limit model with low patching rate is above a\ and ao 
such that the steady state of the limit ODE model with high patching rate is below ao, inducing oscillations 
in the limit hybrid model. This is confirmed in Figure 8(a), where we can also check visually that the 
crossing of the guard surfaces is always transversal. A formal proof can be given as well, by verifying that 
the projection of the vector field on the orthogonal direction to Xi = a\ or Xi = ao is null in a single 
point, namely (k®/ki, a\, 1 — k®/ki — a\) or [k^/ki, ao, 1 — k^/ki — ao), respectively, and observing that the 
trajectory in Figure 8 never passes from these points. As the conditions of Theorem 6.1 are satisfied, we can 
conclude that the sequence of CTMC models obtained by the sCCP-program for increasing converges 
to the hybrid system. 

In Figure 8(b), we show the same model for a different high patching rate and a different threshold ai, 
such that the limit model in state U = 1 converges to a steady state slightly greater than a%, thus never 
activating the instantaneous transition. We can see that the stochastic model behaves in the same way, but 
for N very large, because the proximity of a\ induces an activation of the transition in stochastic trajectories 
with small (but non null) probability for any N. In particular, this implies that if we leave enough time, 
almost surely a stochastic trajectory will eventually cross the surface Xi = a\, changing discrete mode. This 
shows that the notion of weak convergence is restricted to the transient behaviour, but does not bring in 
general information about the steady state. 

6.1 Time-Dependent Guards 

Guards depending on time on instantaneous transitions can be a valuable addition to the modelling language, 
as they allow us to describe global events that have a duration, which can be either deterministic or 
stochastic. 
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More precisely, we consider the extension of sCCP [19] with a reserved keyword time, referring to 
simulation time, whose usage is confined to guards and update functions of instantaneous transition, and 
to update functions of stochastic transitions, which have to be kept discrete. Moreover, the special variable 
time can never be updated. Specifically, we allow instantaneous transitions of the form [g 7r (X,iime) — > 
X' = rv (X, W, time)]^.^,. In particular, g 7r (X, time) is required to be of the form time > ho(X.) Ag 7r> i(X), 
for some function ho and some standard guard predicate g 7r .i(X) (with activation function /ii(X)). We call 
a population-sCCP model (-4,7) with timed-guards a time- guarded population- sC 'CP model. 

Translation of these transitions to TDSHA is straightforward and follows the same scheme as Section 
2.3. The only difference is that in the TDSHA/PDMP setting it is more convenient to internalize the 
notion of time, by adding a dedicated clock variable keeping track of the global simulation time. This is 
done by adding a new continuous variable, Time, and a new automata, called time-monitor, in the parallel 
composition of TDSHA, with a single continuous transition of the form (lTime, 1) ; where lTime is the vector 
equal to one in the position of the variable Time, and zero elsewhere. 

In the following, we restrict our attention to time-guarded transitions that satisfy the following scaling 
assumption with respect to the population size jn- 

Scaling 5 (Discrete Scaling for Time-Guarded Instantaneous Transitions). A normalized time-guarded 
instantaneous sCCP transition with random reset tt = (g^ (X, time), X' = f ( W )(X, ~W^ N \Y), time), 
pi^(X)) of a population-sCCP program (.4, 7jv) with variables partitioned into (X<j,X c ,X e ), Xe£, has 
discrete scaling if and only if: 

1. The activation function of the guard g( N )(K), which is min{time — h^pt), /i^(X)}, is such that 
h\ N \it) converges uniformly in each compact K C E to a continuous function /ij(X), i = 0,1. 
Furthermore, and h do not depend on the variables X c of X that are modified continuously; 

2. satisfies the same conditions as in Scaling 4; 

3. Resets converge weakly (uniformly on compacts), i.e. for each (xW, fW) — > (x,f) in E x M> , 
r^)(xW,WW(xW)',tW) => r(x,W(x),t), as random elements on E. 

Under the previous scaling, if we consider initial conditions (or the state after one jump) such that 
Xq W ' =>■ X , given the independence of the activation function from variables modified continuously, we 
easily obtain h Q N ^ (Xq 7 ^ ) ho(X-o) (reason as in Lemma 6.1). Recalling that in the limit PDMP, Time 
is treated like a regular continuous variable, we can combine this observation with the discussion about 
exit times and reset kernels in the previous section to obtain convergence. Note, in particular, that the 
activation condition on time has always a robustly transversal activation function (as Time is monotonically 
increasing). Hence, the only problems for convergence of a timed-transition can come from the other 
component of the guard (i.e. from the activation function hi). Therefore, we obtain that the time TW in 
which g( N > becomes true converges weakly to the time T in which g becomes true. Then, a minor adaptation 
of the proof of Theorem 6.1 gives the following 

Proposition 6.1. Let (A,Jn) be a sequence of time guarded population- sC 'CP models for increasing sys- 
tems size, 7at — > oo, as N — > oo, with variables partitioned into X = (Xd,X c ,X e ), with discrete stochastic 
actions satisfying scaling 3, instantaneous actions satisfying scaling 4-, time guarded actions satisfying scal- 
ing 5, and continuous actions satisfying scaling 2. Let X.( N \t) be the associated sequence of normalized 
CTMC and x(t) be the limit PDMP associated with the sequence of normalized TDSHA T(-4, 7a0- 

If X { N) x (weakly) and the PDMP is non-Zeno, robustly transversal, has the robust activation 
property and it is size-compatible, then ~X.( N \t) converges weakly to x(t), X( w ' =>■ x, as random elements 
in the space of cadlag function with the Skorokhod metric. □ 

Example 6.5. As an example of time-dependent guards, we consider again the client-server model with 
breakdown, as in Example 5.2. In that example, we used random resets to model a variable level of damage, 
reflecting in the time needed to repair the system. Here, instead, we consider a single damage level, but 
with a generally distributed repair time. In terms of sCCP model, we need an environment variable, say K, 
representing the time in which the server repair will finish. It will be re-sampled from a given distribution 
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(a) Weibull Breakdown: t = 10000 
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Figure 9: Empirical cumulative distribution of clients requesting service at time t = 10000 and average 
number of clients request service for the client-server model of Example 6.5. The model has a fixing time 
sampled from a Weibull distribution with shape 1.5 and rate 1/1000. Other parameters are as in the caption 
of Figure 3. The bimodality of the distribution is captured, with less variability in the the hybrid model. 
The average is almost indistinguishable. 



each time a breakdown occurs. More specifically, let W be a random variable on the positive reals, with 
cumulative distribution function F(t), independent of the current state. Each time the server breaks, we 
set K to time + W. The server agent now becomes 

server = [* -> X[ = X { - 1 A X' b = X b + 1 A K' = time + W] kbXi ■ server 
+ [time = K -> X{ = X { + 1 A X' b = X b - lj^-i . server 

It is easy to see that this modified model satisfies the scaling conditions of Proposition 6.1, as the guard 
of the times transition is independent of jn- It follows that convergence holds, as can be seen in Figure 9, 
where we consider a fixing time sampled according to a Weibull distribution. 

7 Dealing with Guards Depending on Continuous Variables 

In this section, we look at what happens if we allow guards depending on continuous variables in sCCP 
transitions, which in the limit can either be approximated as continuous or be kept discrete and stochastic. 
This additional feature, which is straightforward from the point of view of the modelling language and 
which poses no problems in the definition of the CTMC semantics, has more complex consequences for 
what concerns the hybrid limits. 

We will first focus on guards on continuous transitions, as these are somehow more delicate to deal with. 
Guards on discrete stochastic transitions, which create problems that are, in a certain sense, analogous to 
those with instantaneous transitions, will be discussed later on in Section 7.4. 

7.1 Guards on Continuous Transitions 

Guards on continuous transitions introduce discontinuities in the vector field. In fact, the rate function A Tr (x) 
of a continuous transition n has to be multiplied by the indicator function of the guard predicate, which 
we assume to be of the form h(k) > 0, obtaining the discontinuous function fV(x) = A T (x) • I{/i(x) > 0}. 
In doing this operation, we leave the world of differential equations, entering into the more intricate realm 
of discontinuous or piecewise-smooth dynamical systems (PWSS) [26, 33] or, more generally, of differential 
inclusions [3]. 
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(a) Transversal motion 



(b) Sliding motion 



(c) Sliding vector field 



Figure 10: Schematic representations of transversal motion across %, of sliding motion along "H, and of the 
geometric construction of the sliding vector field G(x). 

The problem, roughly speaking, is that existence and uniqueness of the solution of an ODE with discon- 
tinuous right-hand-side is not guaranteed even if all rate functions are regular (say Lipschitz continuous) 
and if guards are also described by smooth functions (say differentiable functions). Furthermore, solutions 
can exhibit strange behaviours, like sliding motion (sliding on a discontinuity surface) or chattering (Zeno 
behaviour in crossing discontinuity surfaces). 

The lack of uniqueness, in particular, is problematic in our context, as it is a fundamental condition in 
the definition of the class of PDMP we consider here. Indeed, more general frameworks can be considered, 
like PDMP based on differential inclusions, but we leave the investigation of this direction for future work. 

In this paper, we will follow the treatment of [20], in which the author discusses mean field limits in 
presence of guards, when the limit is a PWSS. A more general approach is that of [35], but we stick to the 
first one as we believe it is more intuitive. In the next subsection, we will briefly give an introduction to 
PWSS, in which we will discuss conditions for existence and uniqueness of solutions. Then, we will turn 
our attention to fluid approximation of those systems and plug these results into our framework. 

7.1.1 Piecewise-Smooth Dynamical Systems 

Consider an ordinary differential equation ^ = F(x). A solution in the classical sense is a (continuously) 
differentiable function x(t) such that g|x(i) = F(x(t)), and x(0) = xo- A classical result is the Picard 
Lindelof theorem [53]: if F is (locally) Lipschitz on a set E C R n and xo is in the interior of E, then there 
exists a unique global solution of the differential equation within E. 

However, here we are interested in dynamical systems in which the right hand side of the ODE can be 
a discontinuous function, possibly undefined, on a set of points of measure zero. This is the setting studied 
in the theory of ordinary differential equations with discontinuous right-hand side [33]. In particular, we 
will consider the so-called switching systems or piecewise smooth (PWS) dynamical systems [26, 32]. Let 
F : E — > M. n , with E C K™, and suppose there exist a finite set of domains TZi, i = 1, . . . , s, such that F is 
smooth (or at least Lipschitz) on TZi, the closure oiTZi, and \jTZi 2 E. Notice that F can be discontinuous 
only on the boundaries dTZi of the regions TZi, so that the discontinuous set is Ti = (J dTZi and it has measure 
zero. 

In the following, we will briefly sketch some basic notions of these systems, which we will need in the 
following, starting from the concept of a solution. In fact, given that the vector field is discontinuous, we 
cannot look anymore for solutions which are continuously differentiable functions. Therefore, we will look 
for solutions among absolutely continuous functions, i.e. continuous functions which are equal to the integral 
of another function [52] and are henceforth differentiable almost everywhere. 

In order to define such solutions, we will lift the function F to a set valued function F, F(x) C K n , 
known as the Filippov extension of F. Then we define a Filippov solution as an absolutely continuous 
function x(f) such that x(0) = xo and ^x(i) € F(i(t)) almost everywhere. That is to say, we replace 
the discontinuous differential equation by a differential inclusion [26, 3]. More specifically, we define F(x) 
as cojlinifc^oo -F(xj.) | x^ x, x^ ^ Ti}, where co denotes the convex closure of a set. Notice that for 
each continuity point x of F, F(yi) = {_F(x)}, so that we have a proper differential inclusion only in the 
discontinuity region Ti. 
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For simplicity, consider a PWS system constituted by only two regions IZi and TZ 2 , and let x e H be a 
point of discontinuity of the vector field. Furthermore, suppose that F equals the function F\ on TZi and 
the function F 2 on 1Z 2 , and that F\j(x) < F 2 j(x). In this setting, Fj(x) = [Fij(x), F 2 j(x)]. 

The existence of a solution, starting from a point Xo, is guaranteed under mild conditions on the Filippov 
extension F of F [33]: F must be (locally) bounded 12 and upper semicontinuous 13 . Consider again a PWS 
system with two regions 1Z\ and TZ 2 , as above. Then, existence is guaranteed if functions F\ and F 2 are 
continuous on IZi and TZ 2 . 

In order to understand the behaviour of a PWS dynamical system on a discontinuity point of the vector 
field, we restrict our attention to the two regions system (which is a good local model, unless a discontinuity 
point belongs to the boundary of more than two regions), further assuming that IZi and 1Z 2 are separated 
by a smooth surface %. In particular, H is defined as T-L — {x | h(x) — 0}, where ft- is a function with 
continuous second order derivatives, while IZi = {x | h(x) > 0} and 1Z 2 = {x | h(x) < 0}. We further 
require that Vft(x) 7^ for each point x <G Ti, so that the normal vector n(x) = Vft(x)/||Vft.(x)|| is always 
defined for the surface Ti, and always points into TZ\, see Figure 10. 

To understand the behaviour of a trajectory when it hits the surface H, consider a situation in which the 
solution is in the interior of TZ 2 and hits T-L in x at some time t. Then, two things can happen, depending 
on the relative orientation of the vectors Fi(x) and i*2(x) with respect to %. In particular, as x(i) hits 
% from 1Z 2 , the vector F 2 (x) must point towards 1Z\. If also the vector Fi(x) points towards 1Z\, then 
the trajectory x(t) crosses the surface Ti, possibly with a discontinuity in its derivative. This phenomenon 
is called transversal motion, see Figure 10(a). Alternatively, the vector Fi(x) may point towards 1Z 2 . In 
this case, the trajectory cannot enter 1Zi, as it will be pushed immediately back to %, but, symmetrically, 
it cannot also remain in 1Z 2 . Therefore, the motion is confined in the discontinuity surface W. This kind 
of behaviour is known as sliding motion, see Figure 10(b). In particular, the trajectory x(i) follows the 
solution of the vector field tangential to % obtained by selecting the only vector in F(x) tangential to H, 
see Figure 10(c). More precisely, the sliding motion is defined by the differential equation ^x = G(x), 
where G is the vector field (x) = a(x)fi(x) + (1 — a(x))/ 2 (x). The value of the weight coefficient a(x) is 
obtained by requiring that n T (x)G(x) = (i.e., that G(x) is tangential to %), obtaining 

= nT(x)F 2 (x) 
{ ' n T (x)F 2 (x)-n T (x)f\(x)' 

where n T (x)Fi(x) is the projection of -Fi(x) along the normal vector n(x) of H in x. Sliding motion 
continues until one (and only one) of the two vectors fields, say Fi, becomes tangential to %. In this 
case, the motion continues in the region 1Z\. The condition that only one vector out of -Fi(x) and F 2 (x) 
becomes tangential to H is known as the first order exit condition of the sliding motion [32]. If both F\ 
and F 2 become tangential, then the motion continues on a submanifold of %, but we do not consider these 
situations in this paper, which can, however, be treated similarly to the motion in the intersection of the 
boundary between three or more regions [32]. Hence, from now on we tacitly assume that sliding motion 
terminates with first order exit conditions. 

In general, if we are in a point x of the behaviour of a solution starting in x depends on the values 
of n T (x)Fi(x) and n T (x)F 2 (x): 

• If both n T (x)Fi(x) and n T (x)F 2 (x) are non-zero and have the same sign, then there is a transversal 
crossing of the surface. 

• If n T (x)Fi(x) < and n T (x)_F 2 (x) > 0, we have a stable sliding motion along W. 

• If n T (x)Fi(x) > and n T (x)_F2(x) < 0, we have an unstable sliding motion along T-L. 

• If only n T (x)Fi(x) = 0, then the trajectory continues in the region pointed by ^(x), and similarly 
for n T (x)F 2 (x) = (tangential crossing). 

12 A set function F is locally bounded at x g R n if there exists e > and Mj > such that ||z|| < Mj for each z 6 F(y) 
and y g B(x, e). 

13 A set function F is upper semicontinuous at x g R n is for each e > there exists S > such that F(y) C F(x) + B(0, e) 
for each y g B(x , <5) . 
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For the scope of this paper, the uniqueness result for PSW systems plays a relevant role. More precisely, 
Filippov [33] proved that there is a unique solution starting in x e H, provided that at least one of 
n T (x)Fi(x) < and n T (x)F 2 (x) > holds. Notice that this condition rules out unstable sliding motion. 
There is also a condition for existence and uniqueness expressed in terms of differential inclusions, which 
requires the set-valued function F to be one-sided Lipschitz continuous 14 . We remark that the global 
existence and uniqueness of a solution allows us to define a scmiflow cf>(t, x) for the discontinuous vector 
field F, which is the condition required in the definition of a PDMP adopted here. 

7.2 Deterministic Approximation for PWS Limits 

We will now present the limit result of [20] in the framework of this paper, and then plug it in the proof 
of Theorem 5.1 in order to extend the hybrid convergence limit to this discontinuous setting. We start by 
expanding Scaling 1 to deal with guards. 

Scaling 6 (Continuous Scaling with Guards). A normalized guarded sCCP transition tt — (gi^^X), X' = 
X + i>i N) , \[ N \±)) of a population-sCCP program (A, 7jv), with E the domain of normalised variables X, 
assumed to be all continuous, has continuous scaling if and only if: 

1. The rate and update satisfy the conditions of Scaling 1; 

2. g^(X) is of the form ft W) i(X) > OA ... A /^^(X) > 0, where each h w j is independent of N and has 
continuous second order derivatives, with Vh n j(x.) ^ for all x e {x | h(x) = 0}. 

Consider now a population-sCCP program (A, 7jv), with no discrete variables and no instantaneous 
transitions, and with stochastic actions satisfying either Scaling 1 or 6 (hence, all actions will be approx- 
imated continuously). Then, we can compute its drift according to equation 5, which defines a piecewise- 
smooth system: 

fW(X) = ^E[aife(X)}f(X) 

Notice, in particular that, as the guards are independent of N, it holds that — > F uniformly, where F 
is defined by 

F(X) = ^E[^]l{ g7r (X)}i(X). 

TT 

Let us take a closer look to the PWS systems ^x = F(x). Each transition of the model having a 
non-trivial guard, partitions the state space in several regions. In fact, if the predicate g^ is a conjunction 
of inequalities defined by smooth functions h n j, then each such function partitions the state space in two 
regions: •, where h^j is positive, and T^ j, where h n j is negative. Therefore, in order to define the 
PWS system, we have to consider all possible intersections of regions Vf- and 1ZJ , for all distinct function 
hj appearing in guards of transitions. If there are Too such functions, then we have 2 m ° distinct regions. In 
practice, however, many transitions usually have trivial guards, and there may be transitions sharing the 
same functions hj, so that this number should be reasonably small. In the following, we indicate by Hj the 
manifold defined by hy. Hj = {x £ R" hj(x) = 0}. 

In the rest of the paper, we require that the PWSS defined by F is globally regular, in the following 
sense: 

1. solutions exist globally in E, and are unique, so that the PWSS admits a semi-flow on E; 

2. sliding motion never happens on the intersection of more than one surface, and has first order exit 
condition; 

3. the PWSS has no Zeno trajectories, i.e. the number of transversal crossings and traits of sliding 
motion is finite in each compact time interval [0, T], for each trajectory of the PWSS. 

14 A set valued function F is one-sided Lipschitz if and only if for each xi,X2 6 E and yi £ -F(xi), y2 S F(x2), it holds that 
(xi — X2)* • (yi — y2) < £||xi — X2||, for some L > 0. 
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These conditions are essentially those introduced in [20], just extended to the whole domain E. 15 If the 
PWSS is regular, then each of its trajectories is regular and therefore the following theorem holds: 

Theorem 7.1. Let {A,^n) be a sequence of sCCP models for increasing systems size, satisfying the 
conditions of Theorem 4-1, with all actions tt satisfying either Scaling 1 or Scaling 6. Let ~XS N \t) be the 
associated sequence of normalized CTMC, and assume x^^O) — > x (in probability /almost surely). 

Let x(i) be the solution of the regular PWSS system J^x = F(x) starting in point x(0) = x € E. Fix 
a finite time horizon T < go, Then 



lim sup 



X w (t)-x(i) 



= in probability. 



□ 

As an immediate corollary, we get that if x^' =>■ x , then => x as random elements in the space 

of cadlag functions. 

The intuition behind the proof of the theorem is that each regular trajectory in [0, T] of the PWSS can 
be sliced into a finite number of pieces, such that each piece is either the solution of a standard ODE within 
a continuity region of the vector field, or it is a sliding motion along a discontinuity surface. The idea is 
to prove the convergence of the sequence yiS N \t) to x(t) in each piece separately, either using standard 
deterministic approximation (Theorem 4.1), or using a specialized version of such a result for sliding motion 
(Theorem IV. 2 in [20]). Then, one simply proves convergence in [0, T] by combining the convergence in each 
piece, exploiting convergence of exit times. Sliding motion is the difficult case, because the trajectory of the 
PWSS evolves according to the sliding vector field, which is different from the drift F^ N "> of the sequence of 
CTMC. 

Differently from [20], we are not requiring that rate functions are globally bounded and Lipschitz on E, 
but they just satisfy these properties locally. However, the validity of the previous theorem depends only 
on a compact neighbourhood of the trajectory up to time T, and on the assumption of global existence 
of solutions. Furthermore, we are also allowing random increments, which can be dealt with exactly as in 
Theorem 4.1. 

We also note that we could have relaxed the scaling condition on guards by making guard functions hj 
depend on N, and assuming that they converge uniformly to a limit function, with the same properties. 
The previous theorem would still hold, but with some modification with respect to the proof of [20]. 16 



7.3 Hybrid Limit with Guarded Continuous Transitions 

The previous theorem can be easily plugged in the framework of Section 5, replacing Kurtz theorem in the 
proof device of Theorem 5.1. This can be done under the regularity assumption of the limit PWSS in each 
mode of the PDMP, i.e. for each possible combination of values of discrete variables. We call PWS regular 
such a PDMP. The reason is that the proof of Theorem 5.1 relies only on the weak convergence implied by 
Kurtz theorem and on the continuity of limit trajectories, which are also satisfied by the subset of PWSS 
considered here. Furthermore, as the treatment of instantaneous transitions or time-dependent guards in 
Section 6 is also independent of the fine-grained details of the continuous dynamics, we can also include 
those kinds of transitions. Notice that the notion of non-Zeno PDMP and those of robustly transversal 
PDMP, robust activation property and size-compatible PDMP, extend automatically to this PWS setting, 

15 This regularity condition can be simplified, if the problem is restated in terms of differential inclusions and we require 
that the set-values extension F of F is one-sided Lipschitz. Under this milder assumptions, the following theorem still holds. 
However, we stick here to the formulation in terms of PWSS, which is more natural in the context of PDMP. 

16 Thc proof becomes more involved because if guards are varying with N, F( JV ) does not converge uniformly to F any more. 
Essentially, one proves that the trajectories of the PWSS defined by the discontinuous vector field F( N ) converge uniformly 
in each [0, T] to the trajectories of F, and that converges in probability, uniformly in [0,T], to x( N \ the solution of 

d/dt x( JV >(t) = F( JV )(x( JV )(i)). To show that xW is regular, one relies on the regularity of the limit trajectory x, and on the 
uniform convergence of activation functions of guards and of the components of the vector field. To show convergence of 
to x^-*, one cither invokes an obvious modification of Theorem 4.1, or modifies the proof for sliding motion in [20] using 
again the uniform convergence of guard's functions and of rates. Alternatively, one could work with differential inclusions, as 
in [35]. 
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as they only depend on the existence of the semi-flow of the continuous dynamics of the PDMP. Before 
stating the theorem, we need to extend Scaling 6 to the hybrid setting, as done for scaling 1. 

Scaling 7 (Hybrid Continuous Scaling with Guards). A normalized guarded sCCP transition it = (gt^^X), 
X' = X + vi N \ \i N) (±)) of a population-sCCP program (A,jn), with variables Xe£ partitioned into 
(Xd,X c ,X e ), has hybrid continuous scaling if and only if: 

1. The rate and update satisfy the conditions of Scaling 6; 

2. g^(X) = g^2(Xd, X e ) A g7r, c (X c ), where g,r iC (X c ) satisfies the condition of Scaling 6. 
Therefore, we have the following: 

Proposition 7.1. Let (A, 7jv) be a sequence of time- guarded population- sC 'CP models for increasing sys- 
tems size 7at — > oo, as N — > oo, with variables partitioned into X = (Xd,X c ,X e ) ; with discrete stochastic 
actions satisfying scaling 3, instantaneous actions satisfying scaling 4-, time guarded actions satisfying scal- 
ing 5, and continuous actions satisfying either scaling 2 or scaling 7. Let ~XS N ^ (t) be the associated sequence 
of normalized CTMC and x(i) be the limit PDMP associated with the limit normalized TDSHA 7~(A). 

If x. ( Q N) x (weakly) and the PDMP is non-Zeno, robustly transversal, has the robust activation 
property, is size-compatible and PWS regular, then X.( N \t) converges weakly to x(i), XW =>■ x, as 
random elements in the space of cadlag function with the Skorohod metric. □ 

Example 7.1. We consider a variant of the computer network epidemic model of Example 6.4. In particular, 
we consider the following normal and emergency patching policies. Under the normal policy, patches are 
applied at constant rate kp. Under the emergency policy, instead, computers in the network are patched 

with a rate kp > hp if the fraction Xi of infected nodes is above a threshold a, and at rate kp < hp, kp > kp, 
if the fraction of infected nodes is below a. The emergency policy is initiated as soon as the fraction of 
infected nodes becomes greater than a threshold j3 > a, and is executed for w e R + units of time. We will 
use an environmental variable K to remember the next firing time of such a delayed transition. When the 
emergency policy is aborted, the normal policy is restored. We can model this policy in sCCP by suitably 
modifying the code of Example 6.4, with particular regard to the patching and the control agents. The 
variable U is the discrete variable modelling the patching policy: U — 1 indicates the normal policy, while 
U = 2 indicates the emergency one. 

patching = [U = 1 -> X[ = X t - 1 A X' r = X r + l] fe i x . .patching 

+ [U = 2 A Xi > a -> X< = Xi - 1 A X' r = X r + l] fe 2 X . .patching 

+ [U = 2 A X t < a -> X[ = X, - 1 A X' r = X r + l] fe 3 X . .patching 

control = [Xi >j3^-U' = 2/\K' = time + w]oo : i . control 

+ [time > K — >• V = l]oo:i • control 

The limit model is a PWSS when U = 2. In Figure 11(a), we show a trajectory of the PWSS exhibiting 
sliding motion on the plane Xi — a. It is easy to check that the PWSS has a unique solution from any 
initial state. In fact, taking the scalar product of the two vector fields F\, with the normal to X; t = a 
on the two sides of the plane Xi = a, we obtain kiaX s — k^ct and kiCtX s — kpOt. Now, kiCeX s — fc^a > 

for X s > kp/ki. But as kp > kp, for X s < kp/ki, we have that kiaX s — k^a < akp — k^a < 0, which shows 
that the uniqueness condition is verified. Also in this case, the hybrid limit model is deterministic, and we 
can see its trajectory from a fixed set of initial conditions in Figure 11. Inspecting this trajectory, we can 
easily convince ourselves that the crossing of activation surfaces is always transversal, so that we can apply 
Proposition 7.1. 



7.4 Guards on discrete stochastic transitions 

In this section, we consider a relaxation of Scaling 3, in which we allow guards on discrete stochastic 
transitions to depend on continuous variables. Similarly to continuous transitions, this extension introduces 
discontinuities in the rate functions of the PDMP. Intuitively, as the jump time distribution is obtained by 
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Figure 11: Comparison of single trajectories of the limit PDMP and the CTMC model for system size 1000, 
for the control policies of the network epidemic models of Example 7.1 (left) and Example 7.3 (right). The 
behaviour is essentially the same (modulo randomness of switch times in the figure on the right). Parameters 
are as in the caption of Figure 8. Additionally, in the model of Example 7.1, we have kp = 0.05, k^ = 4.0, 
kp = 0.5, a — 0.1, (3 — 0.3, and the duration of the emergency policy is 10 units of time, the rate of switch 
from emergency to normal patching policy in the model of Example 7.3 is 0.1. 



the cumulative rate, i.e. by integrating the rate function, these discontinuities should not create problems, 
as far as the trajectories of the PDMP do not remain in a discontinuity surface of a rate for too long. 
Essentially, problems emerge if a continuous trajectory of the PDMP slides on the discontinuity surface of 
a guard for some time interval [ii,^]- Suppose that on the surface the guard is false, hence the transition 
of the PDMP cannot fire. In this case, even if trajectories of the CTMC converge to the one of the PDMP, 
they may remain on the "wrong" side of the discontinuity surface, i.e. on the side in which the guard is true 
and the transition active, so that the event can fire in the sequence of CTMC. If this transition determines 
the fate of the system, than convergence to the PDMP can fail quite dramatically. 
To explain better the problem, we consider the following example. 

Example 7.2. Consider a simple sCCP model of a random walk in 1 dimension, for variable X, initially 
set to zero. Variable Z instead, can take values and 1, and it is the fate variable. Initially it is set to 
zero, and it may become 1 by the firing of a stochastic transition with rate 1, but active only if X > 0. The 
sCCP program has initial configuration random_walk |j doom, where 

random_walk = [* — >■ X' = X + 1] 7JV . random_walk 
+ [* — >• X' = X — 1] 7JV . random_walk 
doom = [X>Q-»Z = l]i.O 

Notice that the rate of the transitions of the random_walk agent grow with 7jv, hence they can be 
approximated continuously, and, once normalized, induce the drift F^ N '0C) = F(X.) = 0. Hence, the limit 
PDMP model has quite boring continuous dynamics, in fact a constant one on the discontinuity surface 
x = 0. It follows that the discrete stochastic transition will never fire in the PDMP model, and Z will remain 
0. However, for any N, the CTMC model will spend half of its time on the subspace X > 0, meaning that 
the doom agent will eventually fire its transition, on average in 2 time units. Hence Z will be equal to 1 with 
probability going to 1 as T increases. Hence convergence does not hold for this model. Notice, however, 
that if we set the initial value of x to — e, for e > 0, then convergence will hold. In particular, by the Kurtz 
theorem, for any time T < oo, the CTMC X^(t) will be smaller than — e/2, in [0, T], for N large enough, 
hence the doom transition will not fire in [0, T]. However, it will eventually fire for any N, as with probability 
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one, X( w ) (t) will get eventually above zero and remain there for a long enough time. In addition, the time 
when this event will happen is pushed further and further into the future as N grows. This does not create 
problems for weak convergence, as the Skorohod metric on which weak convergence is based discounts the 
future and will give a smaller and smaller weight to the difference of Z-values as N grows (see Appendix C 
for details on the Skorohod metric). 

From the previous discussion, it should be clear that the problems in introducing guards for discrete 
stochastic transitions are somehow related to the way the flow of the vector field interacts with the discon- 
tinuity surface of the guards. This suggests the following definition: 

Definition 7.1. A cadlag function x(t) taking values in E is robustly compatible with the activation function 
/j(x) of a guard predicate g(x) if and only if the set {t > | h(x(t)) = 0} has Lebesgue measure zero. 
A PDMP is robustly compatible with a guard gjr(x) if almost surely its trajectories are robustly compatible 
with the activation function of g^x). 

A PDMP derived from a TDSHA T is robustly compatible if and only if it is robustly compatible with all 
guards of discrete stochastic transitions of T. 

We can now introduce the scaling condition for guarded discrete stochastic transitions. 
Scaling 8 (Discrete Scaling for Guarded Stochastic Transitions). A guarded normalized sCCP transition 
with random reset, tt = (g (N)7 r(X), X' = fW(X, WW(X)), \i N) (X)) of a population-sCCP program 
(A,jn) w hh variables Xe£ partitioned into (X d ,X c ,X e ), has discrete scaling if and only if: 

1- gl^^X) has activation function h^ N \"K.), converging uniformly in every compact subset K of E to 
the continuous function ft-(X); 

2. and satisfy the same conditions as Scaling 3. 

Technically, the condition of robust compatibility of a PDMP is needed to prove the convergence of 
the stochastic jump times and of the states after the reset. The problem here lies in the fact that, on 
the surface {h(x) = 0}, the reset kernel R of the PDMP is discontinuous, and a sequence of points 
approaching x e {h(x) = 0} may activate a different subset of guards for each BS N \ This may lead to 
radically different behaviours and compromise convergence. This problem is essentially the same as we had 
with the discontinuity of the reset kernel for instantaneous transitions. The robust compatibility condition 
permits us to ignore such points, as there is probability zero of jumping from them. With these assumptions 
in force, we get the following proposition, whose proof can be found in Appendix D. 

Proposition 7.2. Let (A,Jn) be a sequence of 'population- sC 'CP models for increasing systems size — > 
oo, as N — > oo ; with variables partitioned into X = (Xd,X c ,X e ), with discrete stochastic actions satisfying 
either scaling 3 or scaling 8, no instantaneous actions, and continuous actions satisfying scaling 2. Let 
~KS N >(t) be the associated sequence of normalized CTMC and x(t) be the limit PDMP associated with the 
normalized limit TDSHA T(A). 

If Xq N ^ xo (weakly) and the PDMP is non-Zeno and robustly compatible, then X("'(t) converges 
weakly to x(t), x, as random elements in the space of cadlag function with the Skorohod metric. □ 

Example 7.3. We consider again the computer network epidemics model of Examples 6.4 and 7.1. In 
particular, we modify the control policy with respect to Example 7.1 by assuming that the emergency 
policy is dropped in favour of the normal one in an exponentially distributed time, with rate kd, provided 
Xi is below f3 2 < P- The control agent then becomes 

control = [U = 1 A X { > f3 U' = 2] TO:1 . control 
+ [U = 2 A Xi < f3 2 -> U' = l] kd ■ control 

Furthermore, we assume that there is a single emergency patch rate kp > k^, so that the patch agent is 

patching = [U = 1 -> X[ = X t - 1 A X' r = X r + l}k^x, .patching 
+ [U = 2 -)• X[ = X l - 1 A X' r = X r + l} k l Xi .patching 



41 



If we take the scalar product between the vector field and the normal li to the plane Xi — fa — and set it 
to zero in the plane, we get the equation kiX s — kp = 0, which has only one solution (kp/ki, fa, 1 — kp/h — fa) 
if kp/ki + fa < 1, and no solution otherwise. In particular, it follows that the trajectories of the vector field 
do not slide on Xi = fa, hence the PDMP is robustly compatible. Thus, the model satisfies the hypothesis 
of Proposition 7.2, and convergence to the hybrid limit holds. 

Remark* 7.1. In order to check the robust compatibility of a PDMP with respect to a guard of a stochastic 
transition, we can proceed similarly to Remark 6.1, by applying the randomization trick. In particular, the 
property holds if we can show that the set of trajectories of the vector field sliding on the discontinuous 
surface has dimension n-1 or less, 17 where n is the number of continuous variables, and if initial conditions 
and resets are absolutely continuous with respect to the Lebesgue measure. If guards are linear, this check 
should be relatively easy to carry out, see Example 7.3. 

7.5 Collecting all results together. 

In this subsection, we collect in a unique statement all the approximation results spread throughout the 
paper. 

Theorem 7.2. Let (A,Jn) be a sequence of time-guarded populations C CP models for increasing system 
size 7at — >• oo, as N — > oo, with variables partitioned into X = (X^, X c , X e ), satisfying the following scaling 
conditions: 

1. discrete stochastic actions with guards not depending on continuous variables satisfy scaling 3; 

2. discrete stochastic actions with guards depending on continuous variables satisfy scaling 8; 

3. instantaneous actions satisfy scaling 4; 
4- time-guarded actions satisfy scaling 5; 

5. continuous actions with guards not depending on continuous variables satisfy scaling 2; 

6. continuous actions with guards depending on continuous variables satisfy scaling 7; 

LefX.( N \t) be the associated sequence of normalized CTMC andx(i) be the limit PDMP associated with 
the limit normalized TDSHA 7~{A). If 

1. the PDMP is non-Zeno; 

2. the PDMP is robustly transversal, has the robust activation property, and is size-compatible (for 
instantaneous transitions); 

3. the PDMP is robustly compatible with all guards of discrete stochastic actions; 

4- the PDMP is PWS regular (if continuous transitions guarded by continuous variables are present); 
5. Xq^ => xq (weakly) 

then ~X.( N \t) converges weakly to x(t), X x, as random elements in the space of cadlag function with the 
Skorohod metric. □ 

17 This holds, for instance, if the set of zeros of the scalar product of the normal to the discontinuity surface of the guard 
with the vector field has dimension n — 2 or less. 



42 



8 Conclusions 



In this paper, we discussed hybrid limits of Markov population processes, described as stochastic concurrent 
constraint programs. We considered the limit behaviour when only a subset of system variables, corre- 
sponding to populations of growing size, is approximated continuously, while the other variables remain 
discrete. We proved that the sequence of CTMC for increasing population size converges to a stochastic 
hybrid system, described as a Piecewise-Deterministic Markov Process. 

We first considered the simplest case, in which continuous transitions, i.e. those becoming fluxes of the 
limit vector field, satisfy the standard scaling a la Kurtz, while discrete transitions are stochastic and have 
continuous rates and reset kernels. 

We then extended these results by including several sources of discontinuity in the evolution of the 
system: instantaneous transitions in the sCCP program, which induce forced transitions at the PDMP 
level, and guards depending on continuous variables in continuous and stochastic transitions. In all these 
cases, discontinuities create potential problems in the interactions between the deterministic vector field 
in each mode of the PDMP and the discontinuity surfaces of the involved functions. Essentially, the reset 
kernels become non- Feller (i.e. non-continuous), and one has to impose additional conditions to enforce 
convergence of the times at which transitions fire and the states of the system after a jump. In general, the 
conditions required can be quite difficult to check. However, in practical cases the geometry of discontinuous 
surfaces should be reasonably simple (mainly linear hyperplanes), hence checking the required conditions 
should not be too hard. 

Nevertheless, it is unlikely that one can find general algorithmic procedures to check automatically if a 
sequence of models is amenable of hybrid approximation, unless there is no source of discontinuity and rate 
functions and resets satisfy further constraints, see Remark 5.3. 

The moral is that one should avoid introducing too many discontinuities in a model, if approximation 
results are needed to perform more efficient analysis. 

In this direction, we are investigating some relaxation techniques. In most of the cases, boolean conditions 
may be replaced by smooth counterparts without altering significantly the model behaviour. For instance, 
instantaneous events may be replaced by stochastic ones with a rate that changes continuously from zero 
to a very large value in the proximity of the activation surface, as done in [1]. Moreover, guards in discrete 
and stochastic transitions may be replaced by sharp sigmoid functions modulating the rates. However, this 
operation is not always possible without introducing spurious behaviours. In this case, one has to verify 
additional regularity conditions before using hybrid approximation. 

Another strategy to simplify the conditions required for the hybrid limits, similar in spirit to the previous 
one, is to introduce randomness in the continuous evolution. In particular, we could replace the vector field 
of the PDMP by a stochastic differential equation, obtaining a Stochastic Hybrid System in the sense of [22] . 
The simplest possibility is to perturb the trajectories of the vector field with Gaussian noise, obtaining the so- 
called central-limit approximation, for which a limit result analogous to Theorem 4.1 exists (see [42], Chapter 
11). This fact guarantees that convergence proofs presented in this paper extend straightforwardly to this 
new setting. Furthermore, in doing this, we get the advantage of removing the bad behaviours happening at 
the discontinuity boundary. Intuitively, in the central-limit regime, the probability of a trajectory to slide 
on a surface of dimension n— 1 is zero. Similarly, the probability that a trajectory tangentially hits a surface 
is zero. It follows that in this setting, most of the additional conditions on PDMP required for convergence 
hold almost automatically. On the downside, simulating a SDE is more expensive from a computational 
point of view, although the regularity of Gaussian Processes may be exploited to improve efficiency. We are 
currently investigating this direction. 

We also plan to investigate the definition of algorithms to check the conditions for convergence and to 
suggest a partition of variables into discrete and continuous (also for models in which the dependency on 
system size is not explicit). 

An important question related to approximation theorems is if the weak convergence result, which are 
limited to the transient dynamics, can be extended to steady state. In the deterministic case, this can 
be done only in a limited number of cases. Essentially convergence of steady state depends on the phase 
space properties of the limit ODE, and it is guaranteed only in presence of a unique globally attracting 
steady state, see e.g. [5, 7]. In the future, we would like to investigate if similar results can be found for 
the hybrid limit case. The situation in this case is more complex. On the one hand, if the limit process 
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is deterministic, then we can exploit recent results [6], provided we can characterise invariant measures for 
deterministic hybrid systems. On the other hand, if the limit process is stochastic, one has to prove that it 
does indeed have a steady state, and extend the result on invariant measures from the deterministic case to 
the stochastic one. In this setting, computing the invariant measure of the PDMP can be quite challenging 
in itself [27]. 

Finally, we want to understand what happens if we include non-determinism in the framework, especially 
in terms of uncertainty on parameters. This would require consideration of stochastic hybrid systems 
combining differential inclusions [3] with imprecise probabilities [63]. 
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A Notation 

Here are some notational conventions followed throughout the paper. 

• We denote the closure of a set A by A and its boundary by d A. 

• We denote with R the reals, Z the integers, N the natural numbers, and M>o the non negative reals. 

• Q indicates a countable subset of R fe , the set of modes. E is the state space of the fluid limit (in this 
case E CM") or of the hybrid limit, and in this case E C Q x W l . 

• n is the dimension of continuous variables, k is the dimension of discrete variables, m is the dimension 
of the full vector of variables. 

• X denotes the non-normalized vector of variables. We assume an ordering of variables, so we can 
interchange sets and vectors of variables freely. Y or X c are vectors of non-normalized continuous 
variables, while Z or X d are vectors of discrete variables. 

• x, y, z denote points in the non- normalized space K m . 

• X denotes the normalized vector of variables, taking values in E. Y or X c are vectors of normalized 
continuous variables. 

• x, y, z denote points in the normalized space E. 

• Given a vector function / : R™ — > R n , with ./(X) wc indicate the variables it depends to, while with 
/[A](X) we indicate the function corresponding to variable X. 

• 7 and denote the system size. 

• (.4, 7jv) is a population-sCCP program. T(A) is the TDSHA associated with sCCP program A. 
T(A,jn) is the TDSHA associated with the normalized population-sCCP program (A,^n) and 
T(A) is the limit normalized TDSHA. 

• 7r denotes a sCCP action, while r\ denotes a TDSHA transition. 

• X.( N \t) denotes a CTMC associated with a population sCCP model with system size 7 at, while 
X("'(f) denotes the corresponding normalised CTMC. 

• x(£) denotes either the (normalised) limit PDMP or the fluid limit. 

• With B e (k) we indicate the ball of radius e centred in x, while £? e (x([0, T]) = Ute[o t] 

• Ti is the jump time of the i-th stochastic event in the PDMP. Q is the jump time of the i-th instan- 
taneous event. T, is the jump time of the i-th event. 

B Additional material on sCCP 

In this appendix, we provide a few additional details about sCCP. We start by formalising the construction 
transforming a generic sCCP program A into a flat sCCPmodel flat(A). 

Definition B.l. Let A = (A, Def , X, T>, x ) be a sCCP program. Its flattened version flat(A) = 
(A + , Def + , X+, V + , x + ) is constructed as follows: 

• We add a new variable for each component C £ Def: X + = X U P, with P = {Pc | C G Def} taking 
values in N. 

• Each component C is replaced by a component C + . If C = n.A + M, with 7r = [<?(X), w(X, X', /x)]a(x, 
then C+ =tt+. C+ + M+, with n+ = [.g+(X), u+(X, X', ( u)] A+(x) . 

• The guard of tt+ is g+(X) = g(X) A P c > 0. 
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• The update of tt+ is w+(X,X',/x) = u(X,X',/i) AP^F C -1 + #(C, A) A A Cl #c P Ci = P <? a + 
#(Ci, A), where #(C, A) is the number of occurrences of component C in the parallel composition A. 

• The rate of tt+ is A+(X) = P c ■ A(X). 

• If 7T is an instantaneous action, then oo : p(x) becomes oo : Pc • p(x) in ir + . 

• The initial value of variables x + equals x for variables in X and #(C, A) for each variable Pc- 

• the initial network of flat (A) is A + Hlc+eDef+ 

C+. 

The notion of flattening has been previously defined in [21], but was called the extended version of a 
sCCP program. In [18] we also showed that a sCCP program and its flattened version have isomorphic 
labelled transitions systems, hence they are stochastically equivalent. 

Example B.l. We illustrate this transformation with a simple example. Consider a simple model of a 
population of bacteria, in which each bacterium can consume a source of food and reproduce, or die. Both 
actions happen after some exponentially delayed time. We can model this in sCCP by using a single integer 
variable F, representing the available food, initially set to /o, and by describing each bacterium as an agent 
as follows: 

bacterium = [F > -> F' = F — l] kr . (bacterium || bacterium) + [*^*] fed .O 

The initial network bacterium || ... || bacterium consists of to copies of the agent. This model can be 
flattened by introducing a new variable, B, counting the number of bacteria in the system, initially set to 
to, and replacing the agent bacterium by 

bacterium_flat = [F > A B > -» F' = F - 1 A B = B + l] kr -B .bacterium_f lat 
+ [B > -> B' = B - l]k d -B .bacterium_flat 

The new initial network will contain only the agent bacterium_f lat. Notice how the rates are updated to 
take into account the shift of perspective from the single agent to the population view. 

C Background Material 

In this section we briefly recall some notions that are needed in the proofs. 

Hybrid state space. Let Q be a countable subset of R k , and consider QxR™, the hybrid space. A point 
x e Q x R n is a pair x = (q,y), y_G K". 

In Q x R", we define a metric d for which Q x R" is a complete and separable metric space. This metric 
is derived from the euclidean metric d in R™ by 

j/ x n = f d(yi,y2)/(l + d(yi,y2)) if = (g»,y») and q 1 = q 2 , 
\ 1 if x, = (quyi) and q 1 ^ q 2 

In particular, rf(xi,X2) < 1 if and only if X! and x 2 have the same Q-coordinate. Hence, a sequence 
converges with respect to d, xjv — > x, if and only if x = (q, y), xjy = (q, yjv) for N > N 0} and y^v — > y in 
R". 

Each subset A of Q x R™ is of the form A = \J qeQ {q} x A q , A q C R™. A sub-base for the topology of 
Q x R" is given by the open balls of the form {q} x B E (y). The boundary of a set A is denoted by dA and 
the closure by A. Borel sets B for Q x R n are defined from the Borel sets B q of R™ as B = U qeC: {q} x B q . 
See [31] for further details. 
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Skorohod metric. Continuous Time Markov Chains and Piecewise Deterministic Markov Processes con- 
sidered in this paper can be seen as random variables on the space of cadlag functions D([0, oo), E) with 
values in E C Q x E™. A cadlag function / : [0, oo) — > P is right continuous and has left limits for any 

t e [0,00). 

The space P([0, 00), P) is given the structure of a metric space by the Skorohod metric. The Skorohod 
metric is first defined on compact time intervals [0, T] and then extended over the whole positive time axis 
[0,oo). 

Consider the uniform metric on the space D([0,T], E), i.e. sup 0<t<T ||xW(t) — x(i)||. If we have a 
sequence x^) of cadlag functions, then they will converge to x in the uniform norm if and only if the 
discontinuous jumps of happen precisely at the same times as those of x (for N > N ). The idea 
behind the Skorohod metric is to allow a small difference in these jump times by re-synchronizing them. 
That is to say, if the uniform metric allows one to wiggle space a bit, the Skorohod metric allows us also to 
wiggle time. To formalize this statement, let to(t) : [0,T] — > [0,T] be a time-wiggle function, i.e a strictly 
increasing continuous function. Call It the set of such functions. Then, the Skorohod distance between 
xjefl([0,T],E) is 

d T (x,y)= inf max{ sup sup ||x(t) - y(w(i))||}. 

ueI T te[o,T] te[o,T] 

The metric oIt is extended to a metric on D([0, 00), E) by discounting large times as follows: 

d(x,y)= Y, 2- K niin{l,d K (x,y)}. 

Ken 

The Skorohod metric defines a topology for which D([0, 00), E) is complete and separable, i.e. it is a 
Polish space. See [51, 9] for a detailed introduction to the metric and its properties. 

We note here that a sequence of functions x^ € D([Q, 00), E) converges to x e D ([0, 00), E) if and only 
if for each T > there is a sequence of time- wiggle functions G It satisfying sup tg [ T j (t) — £|| — > 

and sup 4e[0jT] ||xW(t) - x( W W(t))|| -+ 0. 

Weak Convergence. The notion of weak convergence of a sequence of random variables X( w ' on a 
Polish space £ to a random variable X is essentially the convergence of the induced probability measures 
on E. Weak convergence of probability measures is defined as convergence in the weak* topology [52] . More 
specifically, denote by Cb{E) the set of bounded continuous functions / : E — > E (note that we can have 
E = D([0, 00), Eq); in this case / is usually called a functional), and let P, pW be probability measures on 
E. We refer the reader to [51, 9] for an introduction to the subject. 

Definition C.l. pW converges weakly to P, pW =>■ P, if and only if, for each / e C(E), 



lim / /(x)p( w )(x)dx = / /(x)P(x)<k 



In case we have random variables X,X( W ), then the previous condition can be written as 

Jim E[/(XW)]=E[/(X)]. 

In this case, we write X^ X. 

The Portmanteau theorem provides a set of equivalent conditions for weak convergence of to X: 

1. X( w ) ^X; 

2. lhriN_ ) . 00 E[/(X^ JV ^)] = E[/(X)] for all bounded, uniformly continuous functions / : E — > E; 

3. limsup^^^ P{X( W ) eF}< P{X e P} for all closed sets P; 

4. liminfAr^eo P{X( W ) eG}> P{X e G} for all open sets G; 
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5. limw^oo P{XW eA}= P{X e A} for all X-continuity sets A (i.e., such that P{X e 9,4} = 0). 

Recall that there are other modes of convergence of random variables, among which almost sure con- 
vergence and convergence in probability. These two notions, differently from weak convergence, require 
to have fixed the sample space in which random variables are defined. More precisely, let XjX^ be 
random variables on E, defined on the sample space fl, with cr-algebra A and probability measure P. 
(i.e. X : fl — > E is a A, B measurable function). Then converges to X almost surely if and only 

if P{limjv^.ooXW(a;) = X(w)} = 1, while it converges in probability if and only if, for each S > 0, 
lim^oo P{||XW - X|| > S} = 0. 

These three notions are linked in several ways. Almost sure convergence implies convergence in proba- 
bility, which in turn implies weak convergence. Furthermore, the Skorohod representation theorem states 

that, if XW =>■ X, then there is a sample space (fl,A,F), and realizations X,X l ' of X,XW on fl (i.e. 

- (N) 

X : fl — > E induces the same probability on £ as X) such that X converges to X almost surely. Further- 
more fl can be taken as the unit interval with the Lebesgue measure. In particular, for real-valued random 
variables X^ N \X on E C R, the Skorohod representation can be constructed using the quantile function 
P<", i.e. the pseudo-inverse of the cumulative distribution function F(t) = F{X < t}: X~i N ) = (#))<"([/) 
and X = F^(U), with U uniform in [0, 1] [51]. 

Finally, weak convergence to a deterministic limit, i.e. a random variable concentrating all the probability 
mass to a point of E, implies convergence in probability. 

In the following, we also need the notion of tight probability measure. 

Definition C.2. A probability measure P on E is tight if and only if, for each e > 0, there is a compact 
set K e C E such that P{K £ ) > 1 - e. 
A sequence 

PW of probability measure s on E is uniformly tight if and only if for each e > 0, there is 
a compact set K £ C E such that P {N \K e ) > 1 - e for each N > 0. 

If the space E is Polish, i.e. a complete and separable metric space, then each probability measure on 
E is tight. Furthermore, if pW P, then is uniformly tight. 

Tightness is the right notion to characterize weak convergence in the space D([0, oo),E) equipped with 
the Skorohod topology. Let n^,...,^ '■ D([0, oo), E) — > P fc be the projection at fixed times t\, . . . , tk € [0, oo). 
If X is a random variable in D([0, oo), E), then 7r tl tk (X) is called a finite dimensional distribution. Now, if 
x,xw are random variables in D([0, oo), E), X^ is uniformly tight and Tr tu ...,t k (X( w )) =>■ 7T tli ... itt . (X), for 
tj taken from a subset T C [0, oo) whose complement is at most countable (convergence of finite dimensional 
distributions), then X^ w ' =>■ X. Uniform tightness of X( w ) can be checked using some criteria based on the 
modulus of continuity, see [51, 9] for further details. 

Finally, we need the continuous mapping theorem: Let X^\ X be random variables on E and let 
h : E — > Pi. If /i is X-almost surely continuous (i.e., P{X e C^} = 1, where Cu C P is the set of continuity 
points of /i), and X^) X, then /i(X( w )) =^> /i(X). 

Markov Kernels. A Markov kernel or Markov transition kernel on P, with cr-algebra B, is a function 
R : E x B -> [0, 1] such that 

1. P(-, A) is a measurable function of for each AG B. 

2. P(y, •) is a probability measure for each y € P. 

We now prove a Lemma that will be used in many proofs in next section, that allows us to propagate 
weak convergence by Markov kernels, under a suitable notion of continuity of the kernel. 

Lemma C.l. Let R^(y) = R^ N \y, •) and R(y) = R(y, •) be Markov transition kernels on some Polish 
space E such that R( N \y( N ^) =>■ R(y), whenever y^ — > y. IfY^ Y, where Y^ N \ Y are random 
elements in E, then pW(YW) => R(Y) and (YW, pW(YW)) ^ (Y,P(Y)). 

Proof. The proof is essentially the same as that of the core argument of Theorem 1 in [40] . We reproduce 
it here just for completeness. Fix a bounded and uniformly continuous function g : E — > R. We need to 
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show that |E[ 5 (pW(YW))]-E[#(P(Y))]| -> 0. For simplicity, call i? (JV) <?(y) = J E g(x)R (N) (y, x) dx and 
similarly i?.g(y), and further let P (Ar) (y) = P{Y( W ) = y}, and similarly for P(y). Then 



|E[ 5 (i?W(YW))]~E[ 5 (i?(Y))]| 



< 



+ 



/ i? ( %(y)P (Ar) (y) dy- / Rg(y)P(y) dy 

JE J E 

[ \R {N) g{y)-Rg(y)\P {N) (y) dy 

JE 



(a) 



f Rg(y)P (N Hy) dy - / Rg(y)P(y) dy 



(6) 

The term (b) in the previous inequality goes to zero as the hypothesis imply that Rg(y) is a continuous 
function (see [40]), and so we just need to focus on (a). As E is a Polish space, i.e. a separable complete 
metric space, it follows from pW =>• P and P tight that pW is uniformly tight. Then, we find a compact 
set E e such that P( N \E e ) > 1 - eftWgW^, and so 

/ |P ( %(y)-Pg(y)|P w (y)dy < / |P (JV) s(y) - P 5 (y)|P (JV) (y) dy + e 

JE JE e 

< sup \RWg(y)-Rg(y)\+e^e, 
yeE E 

and therefore 



hmsup / |P ( %(y)-Pg(y)|P (Ar) (y) dy 



as the hypothesis of the lemma imply that R^g converges to Rg uniformly on compact sets [40]. As 
the previous inequality holds for each e > 0, then JjW(YW) =>■ P(Y). To prove the second part of the 
theorem, observe that in the previous computation we can always restrict the expectation to a closed set 
Fi C E, showing that E\g(RW(YW))l Fl (Y^)] -> E[g(R(Y))lp 1 (Y)]. Now, if we fix another closed 
set Fi C E, and choose bounded uniformly continuous functions g p \, 1f 2 , approximating from above the 
indicator function of F 2l we then have 

P{(Y<"U W (Y W )) E F l x F 2 } <E[.g p (i?W(YW))l Fl (YW)], 

from which, fixing p, 

limsupP{(YW,ijW(YW)) e Pi x F 2 } < E[g p (R(Y))l Fl (Y)}, 

jV->oo 

by letting p — > and invoking the bounded convergence theorem, as g p converges to 1f 2 , we have 
hmsupP{(Y (Ar) ,P (JV) (Y (Ar) )) e F 1 x F 2 } < P{(Y,P(Y)) e F 1 x P 2 }, 

N->oo 

which by the Portmanteau theorem [9], implies that (YW, #)(YC))) (Y, i?(Y)). □ 

Functional Analysis. In the following, we will also need the famous Gronwall inequality, which we recall 
here for convenience. 

Proposition C.l. For any real valued integrable function f on the interval [0, T], if 

f(t)<C + D f f(s) ds, 
Jo 

then 

f(T) < Ce DT . 
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D Proof of Main Lemmas and Theorems 



We will start by providing a quick proof of Theorem 4.1, the classic fluid theorem. This will be helpful in 
proving subsequent lemmas and theorems. 

Theorem (4.1). Let {A, 7jv) be a sequence of population- sC CP models for increasing system size jn oo, 
satisfying the conditions of this section, and with all sCCP-actions ir satisfying the continuous scaling 
condition. Let X.( N \t) be the sequence of normalized CTMC associated with the sCCP-program and x(t) 
be the solution of the fluid ODE. 

7/xq — > xo almost surely, then for any T < oo, sup t<T IjX^^t) — x(t)|| — > as N — > oo, almost surely. 

Proof. Intuitively, the result of the theorem is a limit result, hence it depends only on what happens in a 
neighbourhood _B £ (x([0,T]) of the solution of the ODE. Thus, by restricting our attention to a compact 
set K C E containing _B e (x([0,T]) n E for some e, we can assume that all functions g^ defining the rate 
of normalized transitions are bounded, say by B v , and Lipschitz, say with Lipschitz constant L, q . This 
assumption is not restrictive, as we can always extend the functions g v on the whole E so that they are 
globally bounded and Lipschitz continuous. Clearly, x(t) will remain unchanged by this operation, as it 
depends only on the value of g v in K. 

We consider now the representation of the CTMC X( w '(t) in terms of Poisson processes [42]. We will 
use one Poisson process for each transition r\ and each possible value of v v (which is a random element with 
bounded first and second moments). In the following, we indicate with Pr,(w) the probability that v v = w. 



Furthermore, x(t) in integral form is: 



(7) 



x(t)=x + /*^EN ff f)(x( S ))d S (8) 

In the following, we need the notion of centred Poisson process [42], defined by TV (At) = N (At) — At, for 
which the following law of large numbers holds: sup t<T j^N (NXt) — > almost surely. 
Now, we define 

£ W(t) = xW(()-XW(0)-frEK[ s f(x( s ))d s 

Ja r,er 

= E E ¥^ ( N PvM f ^W) d,s) , 



so that 



|e W (i)ll < E E ^(A^(w)iV). 



By the finiteness of second order moments for the previous equation is summable and we can further 
exchange limit and summation over w [42], to conclude that sup t<T — > by the law of large 

numbers for centred Poisson processes. Therefore, we have that 

sup ||XW(t) - x(t)|| < ||XW - x || + sup \\e {N \t)\\ + sup \\F^(±^ N \t)) - F{± {N \t))\\ 

t<T t<T t<T 



+ 



=<5<«>(T)^0 a.s. 

f \\F(±( N Ht)) - F(±(t))\\ da. 
Jo 



53 



Calling 0( N \T) = sup t<T ||X( w )(t) — x(£)|| and applying Lipschitz condition to the last term, we have 
that 

/3 (n Ht) < S (N \T) + L f f3 (N Ht)dt. 

Jo 

By applying Gronwall's inequality (see Proposition C.l), wc finally obtain 

p( N )(T) < 5^ N \T)e LT -> almost surely. 

□ 

We will turn now to prove Theorem 5.1. We will need some auxiliary lemmas. The first one is a 
straightforward generalization of the Kurtz theorem, to the case in which the initial condition of the limit 
process is sampled from a distribution on the space E. 

Lemma D.l. Let X(t) and x(i) be as in Theorem 4-1- Furthermore, assume that |jXo — xo|| converges to 
zero almost surely. Then, for any T < oo, sup 4<T ||X( w )(i) — x(i)|| — >■ as N — » oo almost surely. 

Proof. To begin with, suppose xo is supported on a compact set Kq. Then the proof proceeds as in Theorem 
4.1, with the only caveat that we need to consider a compact set K containing an e-neighbourhood of all 
trajectories starting in K up to time T (which is a compact set, by continuity of the ODE flow). In fact, 
the argument of Theorem 4.1 does not require that the initial condition of ODE is deterministic, but just 
the convergence in probability of X to x . 

Now, as x is tight, for each e > there is a compact set K £ such that P{x ^ K £ } < e. Conditional 
on Xo € K £ , the convergence of X( w )(i) to x(t) is then almost sure. Hence, fix a sequence Ek i 0, such 
that the corresponding K £k t E, and with J2k £ k < 00 • By discarding a set of measure 0, we can assume 
that convergence in each K £k is sure. Then, any other point u of the probability space (Q,A,F) on which 
processes are defined that makes convergence fail has to belong to the complement of K £k infinitely often. 
By the Borel-Cantelli lemma [8], the set of all such u has probability zero. □ 

We will now turn to consider convergence of stochastic jump times, focussing attention on a single jump 
time, given convergent initial conditions of the stochastic and the piccewisc-dctcrministic system. As we 
consider the first stochastic jump time, we can focus our attention on deterministic systems (with random 
initial conditions). In order to do this, we combine simple properties of the space of cadlag functions with 
the Skorohod representation theorem (see Appendix C). 

Proposition D.l. Let x^ N \ x be elements of D([0, oo), E), such that xW — > x (with respect to the 
Skorohod metrics). Then, for each T > 0, x.( N \s) ds -4- J Q r x(s) ds. 

Proof. By the definition of the Skorohod metrics, let Lj( N \t) be a sequence of time-wiggle functions such 
that sup t < T ||w (JV) (*) - i|| -> 0, and sup t < T ||xW(f) - x(u;W(t))|| -> 0. Now, 

T ||xW(s) -x(s)|| ds< / T ||xW(s)-x(a;W(s))|| ds + f ||x(c^(s)) - x(s)|| ds . 
Jo Jo 

s v ' s v ' 

(a) (6) 

The term (a) goes to zero by the uniform convergence of x( N \t) to x(cj^ Af )(i)), while for (b), observe that the 
function g^ N \t) — ||x(u;W(t)) — x(i)|| goes to zero in every continuity point of x, thus almost everywhere. 
Furthermore, in [0, T] the function x is bounded by a compactness argument (see [9]), and so is g^ N \ so 
that we can apply the bounded convergence theorem [8] to conclude that (b) converges to zero. □ 

Jump times. In order to show convergence of jump times of discrete stochastic transitions, we need the 
notion of cumulative rate for the PDMP A(t) and for the CTMCs. 

Consider discrete stochastic actions it e disc(„4), and the rate of firing of discrete actions in the PDMP, 

H*) = Enedisc(A) M x ), and in the CTMC at level N, AW(x) = £ 

7redisc(.A) ^^(x). Then, the cumulative 

rate for the PDMP x is 

A(t) = / A(x( s )) ds, (9) 
Jo 
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while for the CTMC XW(i) is 

AW(t) = f AW(xW(s)) da. (10) 

These are the cumulative rates of non-homogeneous Poisson processes. We are interested in the first 
firing time, whose cumulative distribution function is given by 1 — e _A (*). By a standard inversion method 
[8, 64], the first firing time for the PDMP is given by r = mf{t > | l-e~ k ^ >U} = inf{i > | A(i) > £}, 
where U is a uniform random variable and £ is an exponentially distributed random variable with rate 1 (it 
holds that £ = —logU). We assume inf0 = oo. Similarly, we can define = inf{t > | A^ N \t) > £}, 
the first firing time of a discrete transition tt G disc(_4) in X("'(t). By the Skorohod representation theorem 
for unidimcnsional random variables (see Section C or, for instance, [51]), if the pointwise convergence of 
A( N \t) to A(t) holds, then — > r almost surely. We can combine these facts with Proposition D.l, to 
prove the following lemma. 

Lemma D.2. Let ~X.( N \t) x(t), and t( n \ t be defined as above. If A is continuous and \( N ^ — > A 
uniformly in each compact set K <Z E, then t^ n ^ t 7 as N ^ oo. 

Proof. The first step of the proof is to use the Skorohod representation theorem to construct realizations 
X^ of and X of X on some probability space V such that X^ — > X almost surely, as random 

elements in the space of cadlag functions. 

It then follows that A' w '(x' ') — > A(X) almost surely. In fact, consider the time-wiggle functions 
(depending also on the sample space (Q,A,F), i.e. wl") = cj^ N \u,t), for ueO) such that — > id and 

Y W = X W o W W -> X uniformly in [0, T]. Then 

sup ||AW(X W ( W W(t))) - A(X(t))|| < sup ||AW ( Y W W ) - A(Y W (i))|| 



(a) 

sup||A(Y W (t))-A(X(t))|| 

t<T 

v v ' 

(«0 



Term (a) goes to zero by uniform convergence of AW to A (as in [0,T] X 1 ' and X are contained in a 

- ( AT) 

compact set), while term (b) goes to zero due to uniform convergence of Y to X in [0,T] and uniform 
continuity of A in [0, T] . 

Now, we can apply Proposition D.l to AW(X W ) -> A(X), to conclude that A< W )(T) -> A(T) almost surely 
for each T > 0. Combining this with the Skorohod representation theorem for real random variables, 18 we 
get f — > f almost surely, where and are the jump times obtained from the realizations of the 
original processes. It then follows that =>■ r. □ 

We finally consider the convergence of states at times and r. 

Proposition D.2. Let X^ =>■ x and iet t^ n \t be stopping times satisfy conditions of the previous lemma. 
Then, conditional on t < oo, X.^ n \t^ n ^) =>■ x(t). 

Proof. In fact, let — > t < oo (here we use implicitly the fact that r < oo). Then, use Skorohod 
representation theorem and take representations and X of and X such that X^ — > X almost 

- (TV) 

surely. By continuity of X and uniform convergence of X and X in [0, T] (the Skorohod metrics and 
the uniform metrics on compact sets are the same when the limit function is continuous), it follows that 

X (JV) (tW) _> X(i) almost surely, hence XW((W) x(i). Hence, by seeing XW and X as Markov 
kernels, we can apply Lemma C.l to conclude. □ 



18 We are effectively coupling X^', x, t( n > and r on the probability space f2 X [0, 1]. Note in particular that we allow r 
and to take the value oo. This can happen with non-null probability if and only if A(T) does not diverge as T — » oo. 
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In order to prove Theorem 5.1, we will use an inductive argument whose core is the following corollary, 
which combines the previous results in the light of weak convergence. 

Corollary D.l. Let (A, 7jv) be a sequence of sCCP models for increasing systems size, satisfying the 
conditions of this section, and with all actions n satisfying the continuous scaling condition. Let XW (t) be 
the associated sequence of normalized CTMC and x(t) be the solution of the fluid ODE. 
IfX ( N) => x 0) then 

1. XW =>■ x, as random elements in the space of cadlag function on E, with the Skorohod metrics. 

2. If t^ n \ t, are the jump times of a stochastic event with rate \( N *> and \, respectively, then => t; 

3. XW(rW)^x(r); 

4- If R( N \y) and R(y) are reset kernels satisfying i^W (yW ) =>■ R(y) whenever yW — > y, i/ien 
#)(XW( T W)) ^i?(x(r)); 

5. Under the previous conditions, 

Proof. The proof works simply by constructing an a.s. convergent realization of the initial conditions. 
Then, by Lemma D.l, we obtain point 1. Point 2 follows from Lemma D.2 and point 1, while point 3 from 
Proposition D.2 and point 2. Point 4 follows from Lemma C.l and point 3. Point 5, instead, follows again 
from Lemma C.l, observing that each element on the vector is defined conditionally on the previous one (e.g. 
XW depends conditionally on X^', r^ w ^ depends conditionally on X'"', and so on), and this dependence 
satisfies the assumptions of the Lemma. For instance, if yW — >■ y, then (X^IXq^ = y^ N ^) => (x|x = y), 
and the dependency is measurable (in fact, continuous) on y^ N \ y. Similar observations hold for the other 
elements of the vector. Then an iterated application of Lemma C.l is enough to conclude. □ 
We can now prove Theorem 5.1. 

Theorem (5.1). Let (A, Jn) be a sequence of population- sC CP models for increasing system size — >■ oo, 
satisfying the conditions of this section, with variables partitioned into discrete X^ ; continuous X C; and 
environment ones X e . Assume that discrete actions satisfy scaling 3 and continuous actions satisfy scaling 
2. Let ~X.( N \t) be the sequence of normalized CTMC associated with the sCCP program and x(i) be the 
PDMP associated with the limit normalized TDSHA T(A). 

If x ( N) x (weakly) and the PDMP is non-Zeno, then X(t) converges weakly to x(t), X =>■ x, as 
random elements in the space of cadlag function with the Skorohod metric. 

Proof. The basic idea of the proof to show weak convergence is to apply inductively the previous corollary, 
to show weak convergence of X.m\t) — X.( N \t A 7^.\) to x m (t) = x(i A r m+1 ), i.e. to processes stopped 
after m + 1 jumps, and then lift this to the full weak convergence. 



Step 1: weak convergence conditional on m jumps or less. Consider the sequence t\,T2, ■ ■ ■ of jump 
times of discrete stochastic transitions in the PDMP x(i) and the sequence t[ N \t2 N \ ... of jump times of 
discrete transitions tt e disc(_4) in ~K^ N \t). In the following, we need to take care also of the fact that r m 
may be infinite with probability greater than zero. Note that, conditional on r m being infinite, all r m+ j will 
be infinite, too. 

In order to be more concise, let us introduce some additional local notation. First, denote — 
XW (rjh )+ ) and z m = x(r+) the states of the CTMC at level N and of the PDMP after the m-th discrete 
jump. If r m or ri ' are infinite, we assume z m or zt, ^ be equal to a special value (q^,0), where q& is a 
special state of Q where nothing happens: the vector field and the jump rate are null (i.e. it is a cemetery 
point). Note that (<7a,0) has distance 1 from any point (q, x) in E. 

Let also Z { Q N) = ± { N) , z = x , t^ N) = t = 0. We define Y^°(i) to bc thc CTMC starting from 
with no discrete jumps, and y m (t) the PDMP starting in z m with no discrete jumps (in fact, an ODE with 
random initial conditions). Notice that, if (resp. r m ) is finite, then Ym\t) (resp. y m (t)) coincides 
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with XC'lri"' + t) (resp. x(r TO + t)) for rif^ < t < t^i (resp. r m < t < r TO+ i), by the strong Markov 
property of CTMC [49] and of PDMP [31]. 

We will now prove that, for each 772 > 0, conditional on T m <C 00, Yrn ^ =4* and tJ^i^ = ^ T 7n _|_i. 
Moreover, if T m+i < 00, then also Z m+i z m +i. Finally, we will also show that, conditional on r m+ i < 
°°, (Zo W \y^\t^\z^\...,Y^\t^,Z^ 1 ) (z ,yo,Ti,zi,...,y m ,r m+ i,z TO+ i). The argument is 
a simple induction. In particular, the induction hypothesis is that (Zq N \ Yq N \ t[ N \ z[ N \ . . . , Z^) 
(zo, yoj T ii zi, . . . ,z m ) conditional on r m < 00. 19 From this, Y^ =>• y m is immediate from Lemma D.l, 
and t^_\ r m+ i follows from Lemma D.2. Now, conditional on r m+1 < 00, we can apply Proposition D.2 
to conclude that Z TO+ i =>■ z m+ i. As r TO+ i < 00 implies r TO < 00, reasoning as in Corollary D.l (using the 
same argument there to extend inductively its length), we obtain the weak convergence of vectors of the 
random elements. 

Consider now ±£\t) = XW^Ar^) andx m (i) = x(Mr m+ i). Wc can write X^' (t) = £™ Y<{ N) (t- 

rf ))I {r f ) < t < + 1(7^ < t}zffii ^d x ro (t) = EHo^ - n)l{n < t < r t+1 } + I{r m+1 < 

t}z m+ i. Now, the functional that associates with T the cadlag element I{?; < T} is continuous with re- 
spect to Skorohod metrics, and if we consider the previous definitions of iC^, ^ and x m as a function of 
(Zq^, Yq W \ t[ N \ z[ N \ . . . , Z^ 1 ) and (z , yo, ti,zi, . . . , z m+1 ), respectively, then this function is continu- 
ous. Hence, conditional on r„ l+ i < 00, X^ x m by the continuous mapping theorem. The same property 
holds also when r m+ i = 00. In fact, conditioning on tj < 00 and tj+i = 00, for j < to, we observe that 
the process Wj(t) = Ei=o y»(* — T i)I{ T i < * < r i+i} + I{* > T j}yj(t — T j) coincides with x TO (i), and by the 
same argument above, applied to vectors (Z^ Q N \ Yq N \ t[ N \ z[ N \ . . . , and (z ,yo, n, zi, . . . , r^+i), the 
processes wf\t) = £^0 ^(t-rf 'jl^ < * < r^} + l{t > rj N) }Yf '(i-rf) converge weakly 
to Wj . Now, the processes are the same up to time rj^ , which is a divergent sequence 

under the event {tj < oo,r, +1 = oo} 20 . This implies that X^ converges weakly to W^ (in fact, their 

Skorokhod distance converges weakly to zero, in fact a.s. under any a.s. realisation of — > 00), and 
hence, by uniqueness of the limit, to Wj = x m . Now, as the events {tj < 00, Tj+i = 00}, for j = 0, . . . , m 
and {r m+ i < 00} are disjoint and their union has probability one, we can remove the conditioning and 
conclude X^ x m . 21 

Step 2: Weak convergence. We now lift the weak convergence X^' =>■ x m to weak convergence 
of the full processes X^) =>■ x. Consider a bounded uniformly continuous function / from the space 
D = D([0, 00), E) of cadlag functions with values in E to E. By the definition of the Skorohod metric d in 
D, for each p > 0, there is a T > such that d(x, y) < dx{x, y) + p, where dx is the metric restricted to the 
compact time interval [0,T] (see Section C). By the uniform continuity of /, given e > 0, we fix a p > 
such that \f(x) — f{y)\ < e/4 whenever d(x,y) < p. 

Now, fix T > according to the previous condition on p, and choose m such that P{r m+ i > T} > 1 — 



9 In particular, this implies that z m 7^ (gA>0), and ultimately also Z^J^ ^ ((7a> 0). 

20 In fact, => Tj+i conditional on tj < 00. Moreover, if Tj+i = 00, it also holds that => Tj + fc for any fc > 0, as 

= 00 and > T j+l 00 ■ This means that by induction we can conclude => Tj for any j. 

21 To see this more precisely, couple all processes W^', Wj, x m , and X^', and the exit times tj on a common space f!, 
and let Qj the subset corresponding to the event {tj = 00, Tj_i < 00}, for j = 1,. . . ,m, and f2o be the subset corresponding 
to the event {r m < 00}. Clearly Qj, for j = 0, ... ,m, form a partition of Q. Let P be the probability measure on Q, let 

fij the push-forward measure on the space of cadlag functions on E of Wj, i.e. of x m conditioned on f2j, and let Hj N ^ be 
the push-forward measure of conditioned on Qj, for j such that pj = P(f!j) > (call J such a set of indices). We 

know Hj N ^ => iij. Then the push- forward measure of x m , coincides with X)jgj VjV-j : an( i similarly fi( N \ the push- forward 
measure of X^', is j PjfJ-j N ^ ■ Now, let F be a closed set of the cadlag space Z?([0,oo),E. Then 

limsup^ JV (F) < Vpjlimsup/i^^F) < Vpj/ijfF) = n(F), 
j e •/ 3 e J 

which implies (iC) => /1, and hence X^' => x m , by the Portmanteau theorem. 
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e/(16|j/|j), which can be found since the expected number of discrete transitions fired by the PDMP at time 
T is finite. As t, 



(N) 
m+1 



r m+ i, we can also find an A such that, for all A > A , P{r^. ) 1 > T} > l-e/(8||/||) 
(using the liminf condition in the Portmanteau theorem). 

Now, conditioning on r m+ i > T, we have that x(i AT) = x m (i A T), and so d(x,x m ) < p. Similarly, 
conditioning on > T, we have XW(i AT) = ± { "\t A T) and d(XW, x£J°) < p. Now 



E[/(X 



(JV)i 



E[/(x)] < E[|/(XW)-/(X, 



(a) 



-E[|/(x)-/(x, 

S v 

(b) 



+ 



E[/(XW)]-E[/(x ro )] 



(c) 



Now, term (c) goes to zero as xlf' 
tion, we have 



x m . To bound (b), instead, using properties of conditional expecta- 



E[|/(x) - /(x m )|] = E[E[|/(x) - /(x m )| | I{r m+1 > T}]] 

< E[|/(x) - /(x ro )| | I{r m+1 > T} = 1] • P{r m+ i > T} + 2||/||P{r m+1 < T} 

< E[|/(x) - /(x ro )| | I{r m+1 > T} = 1] +e/4 < e/2, 

where the last inequality follows from the choice of p and the fact that (i(x,x m ) < p. A similar argument 
can be used for term (a), allowing us to conclude that 

limsu P E[|/(XW)-/(XW)|]< £ /2, 

jV->oo 



from which we have 



lim sup 

N—>cc 



E[/(XW)]-E[/(x)] 



< e. 



By the arbitrariness of e > 0, wc can finally conclude that 



E[/(XW)]-E[/(x)] 



0. 



□ 



D.l Instantaneous transitions 

In this subsection, we give the proofs of results contained in Section 6 of the paper. 

Lemma (6.1). Let (A,"/n) be a sequence of populations C CP models for increasing population size. Let 
X.( N \t) be the associated sequence of normalized CTMC, and suppose X^ =>■ X, where X has a.s. con- 
tinuous sample paths. Let h^ N \ h be activation functions for ~SS N "> and X ; such that ftW — » h uniformly, 
and suppose h is transversal to X. Then £W =>■ (. 

Proof. First of all, use the Skorohod representation theorem to construct representations XW of and 
X of X such that X^ — > X almost surely. Now fix sample trajectories x^ — > x in the Skorohod metrics. 
As X is almost surely continuous, we can assume x continuous. In this case, the Skorohod metric is the 
same as the uniform metric on compact sets [0, T]. In particular, we can take T larger than £ + S, for 
any 5 > 0, where ( is the exit time for x. Now, since h is transversal for X, there is a S > such that 
h{x(t)) > 0_for t € (C,C + <5]. Let ft = min{max{-ft(x(f)) | te [C - 8, £]}, max{/i(x(i)) | f e [C,C + ^]}}- Fix 
£ > 0, e < ft, and let (~ = sup{t < ( | h(x(t)) < -e} and C+ = mf{t > C I h{k{t)) > e}. By continuity of 
x, it follows that ||ft(x(i)) - ft(x(£))|| < e for any t e (C~, (+ ), and that (+ -t ( as e -> 0. 

Now, choose a compact set Ami? that contains the ^-neighbourhood of x in [0, T], for T > £+. As 
ft is uniformly continuous in if, pick a p > such that ||ft(xi) — ft(x2)|| < e/4 whenever ||xi — X2|| < p, 
and fix A > such that x^^t) is p-close to St(t) for A > A , uniformly in [0, T]. Furthermore, find 
Ax such that, for A > A 1; sup ieK ||ft (JV) (x) - ft(x)|| < e/4. Let A = max{A , AJ. It follows that, for 
A > A, \\hW(*W{t)) - ft(x(t))|| < \\hW(i.W{t)) - ft(xW(f))|| + ||ft( X W(t)) - ft(x(i))|| < e/2, and so 
ftW(xW(i)) < for t G [0,C e ~], hence C {N) > C«T- Furthermore, ftW(xW(C+)) > 0, and so C (N) < C ■ lt 
follows C (iV) -> C a.s., and therefore C (JV) =^ C- □ 
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We now turn the attention to resets of instantaneous guards, proving a version of Lemma C.l dealing with 
the discontinuities in the reset kernels under some regularity assumptions. We first recall some notation. 
Let p\ N \ pi be the weight functions, with pi continuous and p\ N ^ uniformly convergent to pi on each 
compact set K C E. Furthermore, let p(x) = Sjp»(x), and similarly pW(x) = J2iPi N \*-)- Let r[ N ^ 
and Ri be the reset kernels associated with the instantaneous transitions satisfying ij^^x^) => i?i(x) 
whenever — > x. Finally, let h\ N \ hi be the activation functions of guards, with hi continuous and h\ N ^ 
converging uniformly to /ij. The further properties that are required for the activation functions are the 
following: 

• Each hi is a robust activation function, according to Definition 6.1. 

• The PDMP is robustly transversal, see Definition 6.3. 

• The set of activation functions hi enjoys the size-compatibility property, see Definition 6.6. 

• The PDMP has the robust activation property, as stated in Definition 6.5. 

Consider now the activation function for the PDMP, defined by h(it) = max{fti(x), . . . , 
ft m (x)}, and let H = {x | h(x) = 0} be the activation surface of instantaneous transitions. %W is 
defined similarly. Furthermore, let Hi = H n {x | /i,(x) = 0} be the portion of H in which transition i 
is active. Call Z), = d-uHi the boundary of Hi in H and D = \J™ =1 Di. The robust activation property 
implies that the probability of jumping from D is zero. Furthermore, let Idep be the index of size-dependent 
activation functions, i.e. such that ftW ^ h, and I in d the index set of size-independent activation functions. 
The size-compatibility condition states that, for each i e Idep and x € int-u(Hi), hj(x) ^ for j ^ i, i.e. 
only hi is zero. 

Finally, recall the definition of the reset kernels on H^ and H: 

m 

i?W(x,.) =^l{/ i f)(x) > 0}(pf»(x)/pW(x))Jif»(x,), 

and 

m 

i?(x, •) = > 0}(Pi(x)/p(x))i2i(x, •)• 

i=l 

Under the previous hypothesis, we can prove the following lemma. 

Lemma D.3. Let R( N \y) and R(y) defined as before and let Y^ =>• Y, where Y^ N \ Y are random 
elements with support in H^ and H, respectively, such that P{Y e D} = 0. Then R( N ^ (Y("l) i?(Y) 
and (YW.JjWfYW)) => (Y,i?(Y)). 

Proof. Fix a bounded and uniformly continuous function 5 : £7 — > R. We need to prove that \E[g(R^ N ^ (Y^))] — 
E[g(i?(Y))]| — > as N — > 00. We use the same notation as in Lemma C.l. The idea is to split the integrals 
J E R( N ^ g(y)P( N ^ (y) dy and J E Rg(y)P(y) dy into several regions, surrounding the discontinuity points by 
a small probability region in such a way that the probability mass is concentrated on a continuity region, 
in which we can apply Lemma C.l. There are some technical details that we have to work out, as Y^ 
and Y are concentrated on a manifold of E. 

Let 8 > (to be fixed afterwards) and K C E be a compact set. Call D$ = \J ±eD Bs(x) the 6- 
neighbourhood of D. Clearly, D$ I D as 5 I 0, and therefore P(Dg) I 0. The same holds for the closure 
TT S : P(Ds) 1 

Consider now a size-dependent activation function, i <G Idep, and let Hi.s = Hi H D c s . By the size- 
compatibility condition, it follows that |^(x)| > for each x e Hi.s and each j ^ i. In particular, 
d(x,Hj) > for each x £ Hi t s, where the distance between a point and a set is defined in the usual way 
as d(x,A) = infy £ ^rf(x,y). Now, notice that H it s is closed and so Hi 5 n K is compact. Therefore, by 
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continuity of the distance d, there is a pij > such that d{x.,'Hj) > pij for each xeHijfl K. 22 Let now 
Si = minj^j pi.j/2, and notice that, for each x £ ti^s H K and y € Bg i ('k), we have d(y,T-Lj) > Si > 0. Let 

^4i,5 = Uxe«i s nK C £>f, then d(y,7ij) > Si for each y e A,^. It follows that hj(y) > in Ai t s, 

which is compact, so that we find a pi > such that ||/ij(y)|| > p% for y e A i( 5 and each j 7^ i. 

By possibly invoking uniform convergence of hj N ^ to hj, for j e 7 dep , the property of A it $ allows us to 
conclude that, for TV large enough, h^ N \y) 7^ in Ai.s- Furthermore, by the robust activation of hi, h\ N ^ 
ultimately changes sign within Ai.$. In particular, combining this with the fact that is supported in 

WW and Y is supported in %, we get that in Ai t g, R( N ^ coincides with and R with R4, hence they 

satisfy the continuity property R^ N \y^ N ">) — > P(x) as — > x. 

We can deal similarly with size-independent activation functions hj, j G l% n d- In this case, however, we 
may have more than one guard robustly active in H, so we really need to consider each possible activation 
profile. Let a be a boolean vector, a e {0, l} m , such that on = for i e Id ep - Call Ji n d the set of such 
vectors. Then we can define T-L a = % fl Hj-a =1 ^j- I n (%a); hi(x) =/= if and only if = 0, hence we 
can reason as for the size-dependent case to construct an open neighbourhood A a j of Ji a n D| in which 
ft^(x) 7^ for N large enough and all x e A a ,s- Since = hj for <x, = 1, and since pW and P are 
supported in "H Q , when restricted to A a j, we can conclude that R^ N \x^ N ^) — > P(x) as xW — » x in A a> ,5. 

Recall the definition of Ji n d, and let J^ep be the set of boolean vectors a <G {0, l} m equal to one only for 
a single i € id ep; and zero elsewhere, and J = J in d U Jdep- For each compact K and (5, we have constructed 
an open set As = \J aeJ A a ^s such that R^ and R behave nicely in it. 

Now, fix e > 0, and, invoking the uniform tightness of pW and P, choose K £ compact such that 
P (N) {K e ) > 1 - e/^lffHoo. Furthermore, pick 5 > such that P(D~ S ) < e/AWgW^. Then we have 



|%(PW(YW))] - E[ 5 (P(Y))]|< 



/ p(%(y)pW(y)dy- / P. 9 (y)P(y) dy 



(a) 



+ 



+ 



+ 



/ pW 5 (y)pW(y)dy- / P ff (y)P(y) dy 



(6) 



/ i? ( %(y)P (A °(y) dy- / P. 9 (y)P(y)dy 



(c) 



/ i? (Ar) . 9 (y)P (W) (y)dy- f P. 9 (y)P(y)dy 

•/iff Jjf! 



(d) 

Now, term (a) goes to zero invoking Lemma C.l, given the continuity of resets in A$. To deal with term (b), 
notice that K £ \(A S UD 5 ) is closed and P(K e \(A s UDs)) = 0, sothat iimsu Pjv p( N 1 {K £ \(A 5 UD S )) = 0. Term 
(c) is dealt with by observing that limsup^ p( N \D$) < P{D$) < e/4||(/|| 00 , and so Urn sup N P^ N \Dg) < 
e /4||5||oo- Therefore (c) is less than e/2. Finally, (d) is less than e/2 by the choice of K s . It follows that 



limsup|E[ ff (P (Ar) (Y (Ar) ))] 

N-s-co 



E[g(R(Y))}\ < e, 



which implies |E[ 5 (pW(YW))] - E[g(R(Y))}\ 
copied verbatim from Lemma C.l. 



0. Proof of the second statement of the theorem can be 

□ 



We now give the proof of Theorem 6.1. 

22 Let / : K — > R be a continuous function on a compact set K such that |/(x)| > for each x g K. Then there is e > 
such that |/(x)| > e for each x 6 K. Suppose not, and choosing e = 1/n, construct a sequence x n such that /(x n ) < 1/n 



and so /(x„) 
contradiction. 



0. By compactness of K, extract a convergent subsequence x„ fc — > y 6 iiT. Then = limj. /(x„ fc ) = /(y), a 
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Theorem (6.1). Let (A,Jn) be a sequence of population- sC CP models for increasing system size 7^ — > 00, 
as N — > 00 , with variables partitioned into X = (Xd,X c ,X e ), with discrete stochastic actions satisfying 
scaling 3, instantaneous actions satisfying scaling 4, and continuous actions satisfying scaling 2. Let XW (t) 
be the associated sequence of normalized CTMC and x(£) be the limit PDMP associated with the normalized 
limit TDSHA f(A). 

If x ( N) => x (weakly) and the PDMP is non-Zeno, robustly transversal, has the robust activation 
property and it is size-compatible, then X.( N \t) converges weakly to x(i), X( w ' =>■ x, as random elements 
in the space of cadlag function with the Skorohod metric. 

Proof. The argument closely follows the proof of Theorem 5.1. The only difference is the definition of jump 
times T 4 (A ° and T t . In this case, in fact, these are defined as the minimum of stochastic jump times tv 
and Ti (conditional on having observed i — 1 jumps) and instantaneous jump times and d (conditional 
on having observed i — 1 jumps). Now, as t-^ =>■ n by Lemma D.2 and =>■ ^ by Lemma 6.1, by the 

continuous mapping theorem it follows that — mm{T^ N \ Ci^} min{T i; ^} = Ti. 

This allows us to extend Corollary D.l, by replacing t with T^' T in point 2, and then 

showing ~K( n )(T( n ^) x(T) (use Proposition D.2 conditional on T < 00). As for convergence of the 
state after the reset, notice that YW = XW(TW) and y = x(T) satisfy the conditions of Lemma 
D.3. Moreover, we have flW(YW) = Ri N) (YW)I{t/ W) < cf°} + i?f '(Y^'llfrf 1 > cf°}, and 
R(y) = R s (y)I{ n < Q} + Ri(y)I{Ti < Q}, where R ( S N) (YW) and i? s (y) are the resets kernels for 
stochastic jumps (constructed from instantaneous transitions) and R^ N \Y^) and Ri(y) are the resets 
kernels for the instantaneous jumps. 

Both satisfy ^(Y^) => R s (y) and fl^YW) => i? 4 (y), as Y^ -> y. Now, as P{Ci = n} = 0, we 
can apply the continuous mapping theorem first to the indicator functions I{r < £} and I{r > £}, to show 
that I{t^ N) < C 4 (Ar) } =► I{n < (J and I{rf JV) > C, (JV) } =► I{t, > and then to the definition of R^ and 
i?, to show that i?W(YW) i?(y). 

Reasoning similarly to Lemma C.l, we have then proved the equivalent of point 4 and 5 of Corollary D.l. 
Then, the proof of the theorem works as in Theorem 5.1. □ 



D.2 Guards in discrete stochastic transitions 

We turn now to prove convergence in the presence of guarded discrete stochastic transitions. As discussed 
in the paper, there are two main issues to deal in this case, caused by the introduction of discontinuities 
in the rate functions. The first is the convergence of jump times, the second is the convergence of states 
after the resets. Both points require an additional regularity property of the PDMP, namely the robust 
compatibility with respect to guards of discrete stochastic transitions. 

We start by showing convergence of exit times. Recall that we have m, say, discrete stochastic transitions, 
with rate functions \\ N \ Aj, and activation functions h\ N ^ and hi associated with guards, such that Aj and 
hi are continuous, and \\ N \ h[ N ^ converge uniformly on compact sets to A^ and hi, respectively. Let 
AW(x) = E^iJ-fX-^C*) > 0]A (Ar) (x) and A(x) = ££=1 > 0}A,(x). Furthermore, we consider 

the following discontinuity surfaces: = {h\ N) (x) = 0} and Hi = {/i,(x) = 0}, H (N) = {JZiK^K and 

We then can prove the following 

Lemma D.4. Let X^' =>• X, as random variables in the space of cadlag functions, with~X. a.s. continuous 
and such that, with probability 1, {t <E M. + | X(i) £ H} has Lebesgue measure 0. Let tW, t be the jump 
times associated with rates \( N ~) and A defined above. Then t( n 1 =>■ r, as N — > 00. 

Proof. We first prove that, if x^' -)xas elements in the space of cadlag functions, x is continuous, and 
A(x(t)) is almost everywhere continuous, then AW(T) = /^^(iWfi)) dt -> J T A(x(i)) di = A(T) 
for every T > 0. In fact, for each continuity point y of A we have that X^ N \y^ N ^) — > A(y) as y^* 1 — > 
y. It follows that, for each t > such that A(x(t)) is continuous, then X^ N \x^ N \t)) — > A(x(t)), as 
x.( N \t) — > x(i) by continuity of x. Therefore, A^^xW) — > A(x) pointwise almost everywhere in [0,T]. 
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Furthermore, by continuity of A,, we have ^(i'"') — > Aj(x), hence {A^ (i^'), Aj(x)} is relatively 
compact in the space of cadlag functions, and so bounded uniformly. It means that there is Mj > 
such that HAf^xW^))!! < Mi and ||Ai(x(t))|| < M t . But as jiW(xW(i)) < £. ^(^(t)) and 
< Ei^i(x(*)), h follows that A^ (xW), A(x) are bounded by X^ M *- Hence, by the bounded 
convergence theorem, AW(xW(i)) di -> j* A(x(t)) dt for every T > 0. 

Now, the statement follows by applying the Skorohod representation theorem, as in Lemma D.2. Let 

X^ X be representations of X'"', X on a probability space (fl, -4,P) such that X^ ' — > X almost surely. 

~ (JV) 

Hence, due to the hypothesis, for oj in a subset of probability 1 of fi, we have that X (oj) — > X(w), X(w) 
is continuous, and A(X(u;)) is almost everywhere continuous. Then we can apply the previous argument to 



(w), X(w), and conclude that A^ N \co,T) converges pointwise to A(uj,T) for each T, from which 



get a.s. convergence of the representation of jump times rW and f. Hence r 



(JV) 



r, as desired. 



□ 



We turn now our attention to reset kernels. We will extend Lemma C.l to deal with the discontinuities in 
the limit kernel using the hypothesis that there is zero probability of being in a discontinuous state when we 
jump. Recall the definition of )S N \ A, hff^ and hi, and further let R\ N \ Ri be the reset kernels, satisfying 
i?f°(xW) => J2i(x), for each xW -> x. Then the full reset kernels are flW(x,-) = E£Li I^f ^) > 
0}(Af ) (x)/A(^)(x))i?f } (x, •), and i?(x, •) = YZi HW*) > 0}(A i (x)/A(x))i? i (x, ■). 

Equipped with these definitions, we can prove the following lemma. 

Lemma D.5. Let R( N \y) and R(y) defined as before and let Y^ =>• Y, where Y^ N \ Y are random 
elements in E such that¥{Y e H} = 0. Then i?W(YW) i?(Y) and (Y^, i?W(YW)) (Y,i2(Y)). 

Proof. The proof is based Lemma C.l, with additional arguments taking care of the discontinuities in iiW 
and R. Fix e > and a bounded and uniformly continuous function g : E — > R. By the same argument 
of Lemma C.l, {Y^^Y} is uniformly tight, and so there is a compact set K E such that P^ N \K £ ) > 
1 — £/4||g , || (X) for each N, and P{K £ ) > 1 — e/4|| (7||oo ■ Furthermore, for S > 0, let "H^ be the closed <$- 
neighbourhood of Hi, defined by Hi. s = {Jiten where B$(x) is the ball of radius 5 centred in x. Let 

also Hs = Ui%,«- Clearly H s I H for 6 1 0, and so P(H S ) I 0. Choose 5 such that < e/AWgW^,. We 

have that 



|E[ 5 ( J RW(YW))]-E[«?(i?(Y))]| < 



/ R {N) g(y)P {N) (y) dy - / Rg{y)P(y) dy 



(a) 



+ 



+ 



I R (N) g(y)P (N) (y)dy- f Rg(y)P(y) dy 

JH 5 JHs 



(b) 



R (N) g(y)P {N \y) dy - / Rg(y)P(y) dy 



(c) 



Now,the reset kernels i?^ and i? in K £ n 7^ satisfy the continuity property R^ N \x^) =>■ i?(x) as 



x, for xW,x G K s C\Hy This follows because, by uniform convergence of ftj^ to /ij on the compact 



A (JV) 

set K e and the fact that K s n "H^ does not contain any discontinuity surface, if xW -+ x, then x^ will 
ultimately satisfy the same guards as x, i.e. Ij/i^^xW) > 0} — > I{/i,(x) > 0}. Then convergence of the 
reset kernels follows as in the unguarded case. This means that we can apply Lemma C.l and conclude that 
term (a) goes to zero. 

As for term (b), notice that Hs is closed, hence limsupjv^^ P( N \H$) < P(H$) < £/4||<7|| (X) by the 
Portmanteau theorem. Finally, term (c) is less than e/2 by the choice of K e and the fact that Rg and R^g 
are both bounded by y^. Hence we have that limsup^^ |E[.g(i? (A,) ( Y(Ar) ))] - E [ff(^( Y ))]l < £, whi ch 
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implies convergence to zero by the arbitrariness of e. This proves R^ N \Y^) R(Y). The second part 
of the statement, instead, follows as in Lemma C.l. □ 



We are finally ready to prove proposition 7.2. 

Proposition (7.2). Let [A, Jn) be a sequence of population- sC 'CP models for increasing systems size 
7tv — > oo, as N — > oo, with variables partitioned into X = (Xj, X c ,X e ), with discrete stochastic actions 
satisfying either scaling 3 or scaling 8, no instantaneous actions, and continuous actions satisfying scaling 
2. Let ~XS N \t) be the associated sequence of normalized CTMC and x(i) be the limit PDMP associated with 
the normalized limit TDSHA T(A). 

Ifx ( N) => x (weakly) and the PDMP is non-Zcno and robustly compatible, then X^ (i) converges 
weakly to x(i), X( w ) =>■ x, as random elements in the space of cadlag function with the Skorohod metric. 

Proof. The proof proceeds essentially as that of Theorem 5.1. The only difference is that we have to replace 
Lemma D.2 with Lemma D.4 and Lemma C.l with Lemma D.5 in corollary D.l and in the proof of Theorem 
5.1. To do this, we just need to show that the robust compatibility of the PDMP guarantees the satisfaction 
of the hypothesis of the two lemmas. This is trivial for Lemma D.4, as robust compatibility is an explicit 
hypothesis. The condition of Lemma D.5, instead, holds because robust compatibility of the PDMP x and 
the absolute continuity of the exponential distribution with respect to the Lebesgue measure imply that the 
event {x(r) G %} has probability zero. □ 
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